Noise and Heterogeneity in Historical Build Data an Empirical Study of Travis CI

Total Page:16

File Type:pdf, Size:1020Kb

Noise and Heterogeneity in Historical Build Data an Empirical Study of Travis CI Noise and Heterogeneity in Historical Build Data An Empirical Study of Travis CI Keheliya Gallaba Christian Macho McGill University University of Klagenfurt Montréal, Canada Klagenfurt, Austria [email protected] [email protected] Martin Pinzger Shane McIntosh University of Klagenfurt McGill University Klagenfurt, Austria Montréal, Canada [email protected] [email protected] ABSTRACT 1 INTRODUCTION Automated builds, which may pass or fail, provide feedback to a After making source code changes, developers execute automated development team about changes to the codebase. A passing build builds to check the impact on the software product. These builds indicates that the change compiles cleanly and tests (continue to) are triggered while features are being developed, when changes pass. A failing (a.k.a., broken) build indicates that there are issues have been submitted for peer review, and/or prior to integration that require attention. Without a closer analysis of the nature of into the software project’s version control system. build outcome data, practitioners and researchers are likely to make Tools such as Travis CI facilitate the practice of Continuous two critical assumptions: (1) build results are not noisy; however, Integration (CI), where code changes are downloaded regularly passing builds may contain failing or skipped jobs that are actively onto dedicated servers to be compiled and tested [1]. The popularity or passively ignored; and (2) builds are equal; however, builds vary of development platforms such as GitHub and CI services such in terms of the number of jobs and conigurations. as Travis CI have made the data about automated builds from a To investigate the degree to which these assumptions about build plethora of open source projects readily available for analysis. breakage hold, we perform an empirical study of 3.7 million build Characterizing build outcome data will help software practition- jobs spanning 1,276 open source projects. We ind that: (1) 12% of ers and researchers when building tools and proposing techniques passing builds have an actively ignored failure; (2) 9% of builds have to solve software engineering problems. For example, Rausch et a misleading or incorrect outcome on average; and (3) at least 44% al. [18] identiied the most common breakage types in 14 Java ap- of the broken builds contain passing jobs, i.e., the breakage is local plications and Vassallo et al. [26] compared breakages from 349 to a subset of build variants. Like other software archives, build open source Java projects to those of a inancial organization. While data is noisy and complex. Analysis of build data requires nuance. these studies make important observations, understanding the nu- ances and complexities of build outcome data has not received suf- CCS CONCEPTS icient attention by software engineering researchers. Early work · Software and its engineering → Software veriication and by Zolfagharinia et al. [29] shows that build failures in the Perl validation; Software post-development issues; project tend to be time- and platform-sensitive, suggesting that interpretation of build outcome data is not straightforward. KEYWORDS To support interpretation of build outcome data, in this paper, Automated Builds, Build Breakage, Continuous Integration we set out to characterize build outcome data according to two harmful assumptions that one may make. To do so, we conduct ACM Reference Format: an empirical study of 3,702,071 build results spanning 1,276 open Keheliya Gallaba, Christian Macho, Martin Pinzger, and Shane McIntosh. source projects that use the Travis CI service. 2018. Noise and Heterogeneity in Historical Build Data: An Empirical Study of Travis CI. In Proceedings of the 2018 33rd ACM/IEEE International Noise. First, one may assume that build outcomes are free of noise. Conference on Automated Software Engineering (ASE ’18), September 3ś However, we ind that in practice, some builds that are marked as 7, 2018, Montpellier, France. ACM, New York, NY, USA, 11 pages. https: successful contain breakages that need attention yet are ignored. For //doi.org/10.1145/3238147.3238171 example, developers may label platforms in their Travis CI conigu- allow_failure Permission to make digital or hard copies of all or part of this work for personal or rations as to enable experimentation with support classroom use is granted without fee provided that copies are not made or distributed for a new platform. The expectation is that once platform support for proit or commercial advantage and that copies bear this notice and the full citation has stabilized, developers will remove allow_failure; however, on the irst page. Copyrights for components of this work owned by others than ACM 1 must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, this is not always the case. For example, the zdavatz/spreadsheet to post on servers or to redistribute to lists, requires prior speciic permission and/or a project has had the allow_failure feature enabled for the entire fee. Request permissions from [email protected]. lifetime of the project (ive years). Examples like this suggest that ASE ’18, September 3ś7, 2018, Montpellier, France © 2018 Association for Computing Machinery. noise is likely present in build outcome data. ACM ISBN 978-1-4503-5937-5/18/09...$15.00 https://doi.org/10.1145/3238147.3238171 1https://github.com/zdavatz/spreadsheet ASE ’18, September 3–7, 2018, Montpellier, France Keheliya Gallaba, Christian Macho, Martin Pinzger, and Shane McIntosh There are also builds that are marked as broken that do not 2.1 Corpus of Candidate Systems receive the immediate attention of the development team. It is We conduct this study by using openly available project metadata unlikely that such broken builds are as distracting for development and build results of GitHub projects that use the Travis CI service teams as one may assume. For example, we ind that on average, to automate their builds. GitHub is the world’s largest hosting two in every three breakages are stale, i.e., occur multiple times in service of open source software, with around 20 million users and 57 a project’s build history. To quantify the amount of noise in build million repositories, in 2017.3 A recent analysis shows that Travis outcome data, we propose an adapted signal-to-noise ratio. CI is the most popular CI service among projects on GitHub.4 Heterogeneity. Second, one may assume that builds are homoge- neous. However, builds vary in terms of the number of executed 2.2 Retrieve Raw Data jobs and the number of supported build-time conigurations. For We begin by retrieving the TravisTorrent dataset [3], which con- example, if the Travis CI coniguration includes four Ruby versions tains build outcome data from GitHub projects that use the Travis and three Java versions to be tested, twelve jobs will be created CI service. As of our retrieval, the TravisTorrent dataset contains per build because 4 × 3 combinations are possible. Zolfagharinia et data about 3,702,595 build jobs that belong to 680,209 builds span- al. [29] observed that automated builds for Perl package releases ning 1,283 GitHub projects. Those builds include one to 252 build take place on a median of 22 environments and seven operating jobs (median of 3). In addition to build-related data, the Travis- systems. Builds also vary in terms of the type of contributor. Indeed, Torrent dataset contains details about the GitHub activity that build outcome and team response may difer depending on the role triggered each build. For example, every build includes a commit of the contributor (core, peripheral). hash (a reference to the build triggering activity in its Git reposi- In this paper, we study build heterogeneity according to ma- tory), the amount of churn in the revision, the number of modiied trix breakage purity, breakage reasons, and contributor type. We iles, and the programming language of the project. TravisTorrent ind that (1) environment-speciic breakages are as common as also includes the number of executed and passed tests. environment-agnostic breakages; (2) the reasons for breakage vary TravisTorrent alone does not satisfy all of the requirements of and can be classiied into ive categories and 24 subcategories; and our analysis. Since TravisTorrent infers the build job outcome by (3) broken builds that are caused by core contributors tend to be parsing the raw log, it is unable to detect the outcome of 794,334 ixed sooner than those of peripheral contributors. jobs (21.45%). Furthermore, TravisTorrent provides a single bro- Take-away messages. Build outcome data is noisy and hetero- ken category, whereas Travis CI records build breakage in three geneous in practice. If build outcomes are treated as the ground diferent categories (see Subsection 2.4). truth, this noise will likely impact subsequent analyses. Therefore, To satisfy our additional data requirements, we complement researchers should ilter out noise in build outcome data before con- the TravisTorrent dataset by extracting additional data from the ducting further analyses. Moreover, tool developers and researchers REST API that is provided by Travis CI. From the API, we collect who develop and propose solutions based on build outcome data the CI speciication (i.e., .travis.yml ile) used by Travis CI to need to take the heterogeneity of builds into account. create each build job and the outcome of each build job. To enable In summary, this paper makes the following contributions: further analysis of build breakages, we also download the plain-text logs of each build job in the TravisTorrent dataset. • An empirical study of noise and heterogeneity of build break- age in a large sample of Travis CI builds. 2.3 Clean and Process Raw Data • A replication package containing Travis CI speciication Since we focus on build breakages, we ilter away projects that do iles, metadata, build logs at the job level, and our data ex- not have any broken builds.
Recommended publications
  • Gitmate Let's Write Good Code!
    GitMate Let's Write Good Code! Artwork by Ankit, published under CC0. Vision Today's world is driven by software. We are able to solve increasingly complex problems with only a few lines of code. However, with increasing complexity, code quality becomes an issue that needs to be dealt with to ensure that the software works as intended. Code reviews have become a popular tool to keep the quality up and problems solvable. They make out at least 30% of the amount of time spent on the development of a software product. Static code analysis and code reviews are converging areas. Still, they are still treated seperately and thus their full synergetic potential remains unused. With GitMate, we want to reinvent the code review process. Our product will integrate static code analysis directly into the code review process to reduce the number of bugs while leaving more time for the development of your favorite features. Our product, the interactive code review bot "GitMate", is not only an easily usable static code analyser, but also actively supports the development process without any overhead for the developer. GitMate is as easy to use and interact with as a collegue next door and unique in its capabilities to even fix bugs by itself. It thereby reduces the amount of work of the reviewer, allowing him to focus on semantic problems that cannot be solved automatically. Product GitMate is a code review bot. It uses coala [1] to perform static code analysis on GitHub Pull Requests [2]. It searches committed changes for possible problems and drops comments right in the GitHub review user interface, effectively following the same workflow of a human reviewer.
    [Show full text]
  • Seisio: a Fast, Efficient Geophysical Data Architecture for the Julia
    1 SeisIO: a fast, efficient geophysical data architecture for 2 the Julia language 1∗ 2 2 2 3 Joshua P. Jones , Kurama Okubo , Tim Clements , and Marine A. Denolle 1 4 4509 NE Sumner St., Portland, OR, USA 2 5 Department of Earth and Planetary Sciences, Harvard University, MA, USA ∗ 6 Corresponding author: Joshua P. Jones ([email protected]) 1 7 Abstract 8 SeisIO for the Julia language is a new geophysical data framework that combines the intuitive 9 syntax of a high-level language with performance comparable to FORTRAN or C. Benchmark 10 comparisons with recent versions of popular programs for seismic data download and analysis 11 demonstrate significant improvements in file read speed and orders-of-magnitude improvements 12 in memory overhead. Because the Julia language natively supports parallel computing with an 13 intuitive syntax, we benchmark test parallel download and processing of multi-week segments of 14 contiguous data from two sets of 10 broadband seismic stations, and find that SeisIO outperforms 15 two popular Python-based tools for data downloads. The current capabilities of SeisIO include file 16 read support for several geophysical data formats, online data access using FDSN web services, 17 IRIS web services, and SeisComP SeedLink, with optimized versions of several common data 18 processing operations. Tutorial notebooks and extensive documentation are available to improve 19 the user experience (UX). As an accessible example of performant scientific computing for the 20 next generation of researchers, SeisIO offers ease of use and rapid learning without sacrificing 21 computational performance. 2 22 1 Introduction 23 The dramatic growth in the volume of collected geophysical data has the potential to lead to 24 tremendous advances in the science (https://ds.iris.edu/data/distribution/).
    [Show full text]
  • Mining DEV for Social and Technical Insights About Software Development
    Mining DEV for social and technical insights about software development Maria Papoutsoglou∗y, Johannes Wachszx, Georgia M. Kapitsakiy ∗Aristotle University of Thessaloniki, Greece yUniversity of Cyprus, Cyprus zVienna University of Economics and Business, Austria xComplexity Science Hub Vienna, Austria [email protected]; [email protected]; [email protected] Abstract—Software developers are social creatures: they com- On more socially oriented platforms like Twitter, which limits municate, collaborate, and promote their work in a variety of post lengths to 280 characters, discussions about software mix channels. Twitter, GitHub, Stack Overflow, and other platforms with an endless variety of other content. offer developers opportunities to network and exchange ideas. Researchers analyze content on these sites to learn about trends To address this gap we present a novel source of long-form and topics in software engineering. However, insight mined text data created by people working in software called DEV from the text of Stack Overflow questions or GitHub issues (https://dev.to). DEV is “a community of software developers is highly focused on detailed and technical aspects of software getting together to help one another out,” focused especially development. In this paper, we present a relatively new online on facilitating cooperation and learning. Content on DEV community for software developers called DEV. On DEV users write long-form posts about their experiences, preferences, and resembles blog and Medium posts and, at a glance, covers working life in software, zooming out from specific issues and files everything from programming language choice to technical to reflect on broader topics.
    [Show full text]
  • Empirical Study of Restarted and Flaky Builds on Travis CI
    Empirical Study of Restarted and Flaky Builds on Travis CI Thomas Durieux Claire Le Goues INESC-ID and IST, University of Lisbon, Portugal Carnegie Mellon University [email protected] [email protected] Michael Hilton Rui Abreu Carnegie Mellon University INESC-ID and IST, University of Lisbon, Portugal [email protected] [email protected] ABSTRACT Continuous integration is an automatic process that typically Continuous Integration (CI) is a development practice where devel- executes the test suite, may further analyze code quality using static opers frequently integrate code into a common codebase. After the analyzers, and can automatically deploy new software versions. code is integrated, the CI server runs a test suite and other tools to It is generally triggered for each new software change, or at a produce a set of reports (e.g., the output of linters and tests). If the regular interval, such as daily. When a CI build fails, the developer is result of a CI test run is unexpected, developers have the option notified, and they typically debug the failing build. Once the reason to manually restart the build, re-running the same test suite on for failure is identified, the developer can address the problem, the same code; this can reveal build flakiness, if the restarted build generally by modifying or revoking the code changes that triggered outcome differs from the original build. the CI process in the first place. In this study, we analyze restarted builds, flaky builds, and their However, there are situations in which the CI outcome is unex- impact on the development workflow.
    [Show full text]
  • Open Source in the Enterprise
    Open Source in the Enterprise Andy Oram and Zaheda Bhorat Beijing Boston Farnham Sebastopol Tokyo Open Source in the Enterprise by Andy Oram and Zaheda Bhorat Copyright © 2018 O’Reilly Media. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online edi‐ tions are also available for most titles (http://oreilly.com/safari). For more information, contact our corporate/institutional sales department: 800-998-9938 or [email protected]. Editor: Michele Cronin Interior Designer: David Futato Production Editor: Kristen Brown Cover Designer: Karen Montgomery Copyeditor: Octal Publishing Services, Inc. July 2018: First Edition Revision History for the First Edition 2018-06-18: First Release The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Open Source in the Enterprise, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the authors, and do not represent the publisher’s views. While the publisher and the authors have used good faith efforts to ensure that the informa‐ tion and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
    [Show full text]
  • Usage, Costs, and Benefits of Continuous Integration in Open
    Usage, Costs, and Benefits of Continuous Integration in Open-Source Projects Michael Hilton Timothy Tunnell Kai Huang Oregon State University University of Illinois at University of Illinois at [email protected] Urbana-Champaign Urbana-Champaign [email protected] [email protected] Darko Marinov Danny Dig University of Illinois at Oregon State University Urbana-Champaign [email protected] [email protected] ABSTRACT CI using several quality metrics. However, the study does Continuous integration (CI) systems automate the compi- not present any detailed information about the use of CI. lation, building, and testing of software. Despite CI rising In fact, despite some folkloric evidence about the use of CI, as a big success story in automated software engineering, it there is no systematic study about CI systems. has received almost no attention from the research commu- Not only do we lack basic knowledge about the extent nity. For example, how widely is CI used in practice, and to which open-source projects are adopting CI, but also what are some costs and benefits associated with CI? With- we have no answers to many other important questions re- out answering such questions, developers, tool builders, and lated to CI. What are the costs of CI? Does CI deliver researchers make decisions based on folklore instead of data. on the promised benefits, such as releasing more often, or In this paper, we use three complementary methods to helping make changes (e.g., to merge pull requests) faster? study in-depth the usage of CI in open-source projects. To Are developers maximizing the usage of CI? Despite the understand what CI systems developers use, we analyzed widespread popularity of CI, we have very little quantita- 34,544 open-source projects from GitHub.
    [Show full text]
  • CI/CD Pipelines Evolution and Restructuring: a Qualitative and Quantitative Study
    CI/CD Pipelines Evolution and Restructuring: A Qualitative and Quantitative Study Fiorella Zampetti,Salvatore Geremia Gabriele Bavota Massimiliano Di Penta University of Sannio, Italy Università della Svizzera Italiana, University of Sannio, Italy {name.surname}@unisannio.it Switzerland [email protected] [email protected] Abstract—Continuous Integration and Delivery (CI/CD) • some parts of the pipelines become unnecessary and can pipelines entail the build process automation on dedicated ma- be removed, or some others (e.g., testing environments) chines, and have been demonstrated to produce several advan- become obsolete and should be upgraded/replaced; tages including early defect discovery, increased productivity, and faster release cycles. The effectiveness of CI/CD may depend on • performance bottlenecks need to be resolved, e.g., by the extent to which such pipelines are properly maintained to parallelizing or restructuring some pipeline jobs; or cope with the system and its underlying technology evolution, • in general, the pipeline needs to be adapted to cope with as well as to limit bad practices. This paper reports the results the evolution of the underlying software and systems, of a study combining a qualitative and quantitative evaluation including technological changes (e.g., changes of archi- on CI/CD pipeline restructuring actions. First, by manually analyzing and coding 615 pipeline configuration change commits, tectures, operating systems, or library upgrades). we have crafted a taxonomy of 34 CI/CD pipeline restructuring We report the results of an empirical qualitative and quanti- actions, either improving extra-functional properties or changing tative study investigating how CI/CD pipelines of open source the pipeline’s behavior.
    [Show full text]
  • The Open Source Way 2.0
    THE OPEN SOURCE WAY 2.0 Contributors Version 2.0, 2020-12-16: This release contains opinions Table of Contents Presenting the Open Source Way . 2 The Shape of Things (I.e., Assumptions We Are Making) . 2 Structure of This Guide. 4 A Community of Practice Always Rebuilding Itself . 5 Getting Started. 6 Community 101: Understanding, Joining, or Forming a New Community . 6 New Project Checklist . 14 Creating an Open Source Product Strategy . 16 Attracting Users . 19 Communication Norms in Open Source Software Projects . 20 To Build Diverse Open Source Communities, Make Them Inclusive First . 36 Guiding Participants . 48 Why Do People Participate in Open Source Communities?. 48 Growing Contributors . 52 From Users to Contributors. 52 What Is a Contribution? . 58 Essentials of Building a Community . 59 Onboarding . 66 Creating a Culture of Mentorship . 71 Project and Community Governance . 78 Community Roles . 97 Community Manager Self-Care . 103 Measuring Success . 122 Defining Healthy Communities . 123 Understanding Community Metrics . 136 Announcing Software Releases . 144 Contributors . 148 Chapters writers. 148 Project teams. 149 This guidebook is available in HTML single page and PDF. Bugs (mistakes, comments, etc.) with this release may be filed as an issue in our repo on GitHub. You are also welcome to bring it as a discussion to our forum/mailing list. 1 Presenting the Open Source Way An English idiom says, "There is a method to my madness."[1] Most of the time, the things we do make absolutely no sense to outside observers. Out of context, they look like sheer madness. But for those inside that messiness—inside that whirlwind of activity—there’s a certain regularity, a certain predictability, and a certain motive.
    [Show full text]
  • A First Look at Developers' Live Chat on Gitter
    A First Look at Developers’ Live Chat on Gitter Lin Shi Xiao Chen Ye Yang [email protected] [email protected] [email protected] Institute of Software Chinese Institute of Software Chinese School of Systems and Enterprises, Academy of Sciences, University of Academy of Sciences, University of Stevens Institute of Technology Chinese Academy of Sciences, China Chinese Academy of Sciences, China Hoboken, NJ, USA Hanzhi Jiang Nan Niu Qing Wang∗ Ziyou Jiang [email protected] [email protected] {hanzhi2021,ziyou2019}@iscas.ac.cn Department of EECS, University of State Key Laboratory of Computer Institute of Software Chinese Cincinnati, Cincinnati, OH Science, Institute of Software Chinese Academy of Sciences, University of USA Academy of Sciences, University of Chinese Academy of Sciences, China Chinese Academy of Sciences, China ABSTRACT KEYWORDS Modern communication platforms such as Gitter and Slack play an Live chat, Team communication, Open source, Empirical Study increasingly critical role in supporting software teamwork, espe- ACM Reference Format: cially in open source development. Conversations on such platforms Lin Shi, Xiao Chen, Ye Yang, Hanzhi Jiang, Ziyou Jiang, Nan Niu, and Qing often contain intensive, valuable information that may be used for Wang. 2021. A First Look at Developers’ Live Chat on Gitter. In Proceed- better understanding OSS developer communication and collabora- ings of the 29th ACM Joint European Software Engineering Conference and tion. However, little work has been done in this regard. To bridge Symposium on the Foundations of Software Engineering (ESEC/FSE ’21), Au- the gap, this paper reports a first comprehensive empirical study gust 23–28, 2021, Athens, Greece.
    [Show full text]
  • How Are Issue Reports Discussed in Gitter Chat Rooms?
    How are issue reports discussed in Gitter chat rooms? Hareem Sahar1, Abram Hindle1, Cor-Paul Bezemer2 University of Alberta Edmonton, Canada Abstract Informal communication channels like mailing lists, IRC and instant messaging play a vital role in open source software development by facilitating communica- tion within geographically diverse project teams e.g., to discuss issue reports to facilitate the bug-fixing process. More recently, chat systems like Slack and Git- ter have gained a lot of popularity and developers are rapidly adopting them. Gitter is a chat system that is specifically designed to address the needs of GitHub users. Gitter hosts project-based asynchronous chats which foster fre- quent project discussions among participants. Developer discussions contain a wealth of information such as the rationale behind decisions made during the evolution of a project. In this study, we explore 24 open source project chat rooms that are hosted on Gitter, containing a total of 3,133,106 messages and 14,096 issue references. We manually analyze the contents of chat room discus- sions around 457 issue reports. The results of our study show the prevalence of issue discussions on Gitter, and that the discussed issue reports have a longer resolution time than the issue reports that are never brought on Gitter. Keywords: developer discussions, Gitter, issue reports Email address: [email protected] (Hareem Sahar ) 1Department of Computing Science, University of Alberta, Canada 2Analytics of Software, Games and Repository Data (ASGAARD) lab, University of Al- berta, Canada Preprint submitted to Journal of Systems and Software October 29, 2020 1. Introduction Open source software (OSS) development uses the expertise of developers from all over the world, who communicate with each other via email, mailing lists [1], IRC channels [2], and modern communication platforms like Gitter 5 and Slack [3].
    [Show full text]
  • Spice Program Open Source 2015 Learnings
    Spice Program Open Source 2015 learnings Please see this Futurice Spice Program blog post to learn the context for this list. ​ ​ Learned even more about practical problems in measuring time. Direct uploads from browser to AWS S3. Basics of Polymer. React + Redux (lots of it). Basic info about Google Cloud Services. Practical usage of Google Cloud Storage (the Google's S3 equivalent). Heroku Pipelines. Improved & more modular Angular application structure formation. Webpack, JSON Web Token implementation on the backend, and ES6. Actually learning how Promises should be used in JavaScript. Some knowledge about Dokku, the host­it­yourself­Heroku. Got practical experience and more understanding about Node.js, npm, and bacon.js. How the Azure Web App continuous deployment works. A bit on how Travis CI operates under the hood. How to make Joomla templates. Learned about Let's Encrypt on Apache with vhosts. Learned more about Web MIDI API and RxJS. Also more about ES6 and setting up a compact development environment for that. Learned basics of RxJS. Learned a lot about creating a reusable library with JS/browserify. Learned some Cycle.js. Learned a little about Node.js testing. Learned the EMV chip card specification on very detailed level by implementing large part of the specification. I'm using the EMV knowledge on daily basis in my current client project. Improved my Clojure language skills. Learned to work with the Go programming language. Wrote an OSS tool that I later then used in a customer project. I learned better ways to use Dagger 2 in Android and trial and evaluate other app architectural patterns based around this.
    [Show full text]
  • Open Source Best Practices: from Continuous Integration to Static Linters
    THE MOLECULAR SCIENCES SOFTWARE INSTITUTE - LOGO CONCEPTS ROUND 2 Open Source Best Practices: From Continuous Integration to Static Linters Daniel G. A. Smith and Ben Pritchard The Molecular Sciences Software Institute !1 @dga_smith / @molssi_nsf 1. 3 The Molecular Sciences Software Institute (MolSSI) •Designed to serve and enhance the software development efforts of the broad field of computational molecular science (CMS). •Funded at the institute level under the Software Infrastructure for Sustained Innovation (SI2) initiative by the National Science Foundation. •Launched August 1st, 2016 •Collaborative effort by: •Virginia Tech •U.C. Berkeley •U. Southern •Rice U. •Stanford U. California •Stony Brook U. •Rutgers U. •Iowa State U. !2 The Molecular Sciences Software Institute (MolSSI) •Software Infrastructure •Build open-source frameworks and packages for the entire CMS community •Education and training •Online materials and training guides •Host a number of targeted workshops, summer schools, and other training programs •Community Engagement and Leadership •Interoperability standards and best practices !3 •Data and API standardization targets Talk Motivation •Many terms thrown around: •Best Practice •Manage Code •DevOps •Build and Package •Open Source •“Just get the physics •Software Sustainability working” •“Incorrect Software” “Best” is subjective and depends on your own community. Lets discuss practical applications and how to make our lives easier! !4 Open Source Open Source software is software that can be freely accessed, used,
    [Show full text]