Datadeps.Jl: Repeatable Data Setup Journal of for Reproducible Data Science
Total Page:16
File Type:pdf, Size:1020Kb
Load more
										Recommended publications
									
								- 
												  Seisio: a Fast, Efficient Geophysical Data Architecture for the Julia1 SeisIO: a fast, efficient geophysical data architecture for 2 the Julia language 1∗ 2 2 2 3 Joshua P. Jones , Kurama Okubo , Tim Clements , and Marine A. Denolle 1 4 4509 NE Sumner St., Portland, OR, USA 2 5 Department of Earth and Planetary Sciences, Harvard University, MA, USA ∗ 6 Corresponding author: Joshua P. Jones ([email protected]) 1 7 Abstract 8 SeisIO for the Julia language is a new geophysical data framework that combines the intuitive 9 syntax of a high-level language with performance comparable to FORTRAN or C. Benchmark 10 comparisons with recent versions of popular programs for seismic data download and analysis 11 demonstrate significant improvements in file read speed and orders-of-magnitude improvements 12 in memory overhead. Because the Julia language natively supports parallel computing with an 13 intuitive syntax, we benchmark test parallel download and processing of multi-week segments of 14 contiguous data from two sets of 10 broadband seismic stations, and find that SeisIO outperforms 15 two popular Python-based tools for data downloads. The current capabilities of SeisIO include file 16 read support for several geophysical data formats, online data access using FDSN web services, 17 IRIS web services, and SeisComP SeedLink, with optimized versions of several common data 18 processing operations. Tutorial notebooks and extensive documentation are available to improve 19 the user experience (UX). As an accessible example of performant scientific computing for the 20 next generation of researchers, SeisIO offers ease of use and rapid learning without sacrificing 21 computational performance. 2 22 1 Introduction 23 The dramatic growth in the volume of collected geophysical data has the potential to lead to 24 tremendous advances in the science (https://ds.iris.edu/data/distribution/).
- 
												  Empirical Study of Restarted and Flaky Builds on Travis CIEmpirical Study of Restarted and Flaky Builds on Travis CI Thomas Durieux Claire Le Goues INESC-ID and IST, University of Lisbon, Portugal Carnegie Mellon University [email protected] [email protected] Michael Hilton Rui Abreu Carnegie Mellon University INESC-ID and IST, University of Lisbon, Portugal [email protected] [email protected] ABSTRACT Continuous integration is an automatic process that typically Continuous Integration (CI) is a development practice where devel- executes the test suite, may further analyze code quality using static opers frequently integrate code into a common codebase. After the analyzers, and can automatically deploy new software versions. code is integrated, the CI server runs a test suite and other tools to It is generally triggered for each new software change, or at a produce a set of reports (e.g., the output of linters and tests). If the regular interval, such as daily. When a CI build fails, the developer is result of a CI test run is unexpected, developers have the option notified, and they typically debug the failing build. Once the reason to manually restart the build, re-running the same test suite on for failure is identified, the developer can address the problem, the same code; this can reveal build flakiness, if the restarted build generally by modifying or revoking the code changes that triggered outcome differs from the original build. the CI process in the first place. In this study, we analyze restarted builds, flaky builds, and their However, there are situations in which the CI outcome is unex- impact on the development workflow.
- 
												  Usage, Costs, and Benefits of Continuous Integration in OpenUsage, Costs, and Benefits of Continuous Integration in Open-Source Projects Michael Hilton Timothy Tunnell Kai Huang Oregon State University University of Illinois at University of Illinois at [email protected] Urbana-Champaign Urbana-Champaign [email protected] [email protected] Darko Marinov Danny Dig University of Illinois at Oregon State University Urbana-Champaign [email protected] [email protected] ABSTRACT CI using several quality metrics. However, the study does Continuous integration (CI) systems automate the compi- not present any detailed information about the use of CI. lation, building, and testing of software. Despite CI rising In fact, despite some folkloric evidence about the use of CI, as a big success story in automated software engineering, it there is no systematic study about CI systems. has received almost no attention from the research commu- Not only do we lack basic knowledge about the extent nity. For example, how widely is CI used in practice, and to which open-source projects are adopting CI, but also what are some costs and benefits associated with CI? With- we have no answers to many other important questions re- out answering such questions, developers, tool builders, and lated to CI. What are the costs of CI? Does CI deliver researchers make decisions based on folklore instead of data. on the promised benefits, such as releasing more often, or In this paper, we use three complementary methods to helping make changes (e.g., to merge pull requests) faster? study in-depth the usage of CI in open-source projects. To Are developers maximizing the usage of CI? Despite the understand what CI systems developers use, we analyzed widespread popularity of CI, we have very little quantita- 34,544 open-source projects from GitHub.
- 
												  Spice Program Open Source 2015 LearningsSpice Program Open Source 2015 learnings Please see this Futurice Spice Program blog post to learn the context for this list.   Learned even more about practical problems in measuring time. Direct uploads from browser to AWS S3. Basics of Polymer. React + Redux (lots of it). Basic info about Google Cloud Services. Practical usage of Google Cloud Storage (the Google's S3 equivalent). Heroku Pipelines. Improved & more modular Angular application structure formation. Webpack, JSON Web Token implementation on the backend, and ES6. Actually learning how Promises should be used in JavaScript. Some knowledge about Dokku, the hostityourselfHeroku. Got practical experience and more understanding about Node.js, npm, and bacon.js. How the Azure Web App continuous deployment works. A bit on how Travis CI operates under the hood. How to make Joomla templates. Learned about Let's Encrypt on Apache with vhosts. Learned more about Web MIDI API and RxJS. Also more about ES6 and setting up a compact development environment for that. Learned basics of RxJS. Learned a lot about creating a reusable library with JS/browserify. Learned some Cycle.js. Learned a little about Node.js testing. Learned the EMV chip card specification on very detailed level by implementing large part of the specification. I'm using the EMV knowledge on daily basis in my current client project. Improved my Clojure language skills. Learned to work with the Go programming language. Wrote an OSS tool that I later then used in a customer project. I learned better ways to use Dagger 2 in Android and trial and evaluate other app architectural patterns based around this.
- 
												  Open Source Best Practices: from Continuous Integration to Static LintersTHE MOLECULAR SCIENCES SOFTWARE INSTITUTE - LOGO CONCEPTS ROUND 2 Open Source Best Practices: From Continuous Integration to Static Linters Daniel G. A. Smith and Ben Pritchard The Molecular Sciences Software Institute !1 @dga_smith / @molssi_nsf 1. 3 The Molecular Sciences Software Institute (MolSSI) •Designed to serve and enhance the software development efforts of the broad field of computational molecular science (CMS). •Funded at the institute level under the Software Infrastructure for Sustained Innovation (SI2) initiative by the National Science Foundation. •Launched August 1st, 2016 •Collaborative effort by: •Virginia Tech •U.C. Berkeley •U. Southern •Rice U. •Stanford U. California •Stony Brook U. •Rutgers U. •Iowa State U. !2 The Molecular Sciences Software Institute (MolSSI) •Software Infrastructure •Build open-source frameworks and packages for the entire CMS community •Education and training •Online materials and training guides •Host a number of targeted workshops, summer schools, and other training programs •Community Engagement and Leadership •Interoperability standards and best practices !3 •Data and API standardization targets Talk Motivation •Many terms thrown around: •Best Practice •Manage Code •DevOps •Build and Package •Open Source •“Just get the physics •Software Sustainability working” •“Incorrect Software” “Best” is subjective and depends on your own community. Lets discuss practical applications and how to make our lives easier! !4 Open Source Open Source software is software that can be freely accessed, used,
- 
												  Code Coverage & Continuous IntegrationCode Coverage & Continuous Integration ATPESC 2019 Jared O’Neal Mathematics and Computer Science Division Argonne National Laboratory Q Center, St. Charles, IL (USA) July 28 – August 9, 2019 exascaleproject.org License, citation, and acknowledgments License and Citation • This work is licensed under A CreAtive Commons Attribution 4.0 InternAtionAl License (CC BY 4.0). • Requested citAtion: Jared O’NeAl, Code CoverAge & Continuous IntegrAtion, in Better Scientific SoftwAre TutoriAl, Argonne TrAining ProgrAm on Extreme-Scale Computing (ATPESC), St. ChArles, IL, 2019. DOI: 10.6084/m9.figshAre.9272813. Acknowledgements • This work wAs supported by the U.S. DepArtment of Energy Office of Science, Office of Advanced Scientific Computing ReseArch (ASCR), And by the Exascale Computing Project (17-SC-20-SC), A collAborAtive effort of the U.S. DepArtment of Energy Office of Science And the NAtionAl NucleAr Security AdministrAtion. • This work wAs performed in pArt At the Argonne NAtionAl LAborAtory, which is mAnAged by UChicago Argonne, LLC for the U.S. DepArtment of Energy under ContrAct No. DE-AC02-06CH11357 • AliciA Klinvex 2 ATPESC 2019, July 28 – August 9, 2019 CODE COVERAGE How do we determine what other tests are needed? Code coverage tools • Expose parts of the code that aren’t being tested • gcov o standard utility with the GNU compiler collection suite o Compile/link with –coverage & turn off optimization o counts the number of times each statement is executed • lcov o a graphical front-end for gcov o available at http://ltp.sourceforge.net/coverage/lcov.php
- 
												  Continuous Testing & IntegrationContinuous Testing & Integration Philippe Collet (avec beaucoup de M1 PROJET - 2019-2020 (images: pixabay) 1 slides de Guilhem. Molines) Delivering Software P. C o l l e t 2 Delivering software… • In 1995, MS released Windows 95 – Major breakthrough for consumers – Real multi-tasking with background apps • How many service packs were released? 3 Delivering software… • In 1995, MS released Windows 95 – Major breakthrough for consumers – Real multi-tasking with background apps • How many service packs were released? à 2 (yes, two) (6 months and 1 year after release) 4 Delivering software… • In 2015, MS released Windows 10 – 4 editions – Continuous, forced updates • How many service packs were released? 5 Delivering software… • In 2015, MS released Windows 10 – 4 editions – Continuous, forced updates • How many service packs were released? à couldn’t count… (1 GB on first day) 6 What is the most important thing you need to deliver software continuously? Trust 7 How do you get trust in your software delivery? • Good architecture • Development guidelines • Project management • Controlled processes • Traceability of requirements • Automation • Repeatable pipeline • Testing • Testing • And… Testing 8 Software Delivery Life Cycle (SDLC) In a nutshell P. C o l l e t 9 Single student, simple project • Ant • Maven? • SCM? • Code and click 10 Group of students, simple project • Ant / Maven • Github • Code and click • Build on local machine • Ticket management ? 11 Industrial complex project • 20 – 200 contributors • One release / year • One patch / month • > 20M LOCs • Deployment complexity • à What do you need? 12 Industrial complex project • Componentization • Independence of builds • Each component tested • Clear quality indicators • Requirement traceability • Fast builds • Each contributor only builds what they code • Deployment automation 13 Software quality • What is it? • No Defects? • Or….
- 
												  I'm Leaving You, Travis: a Continuous Integration Breakup StoryI’m Leaving You, Travis: A Continuous Integration Breakup Story David Gray Widder, Michael Hilton, Christian Kästner, Bogdan Vasilescu Carnegie Mellon University, USA ABSTRACT it’s natural for different projects to have different needs, as wellas Continuous Integration (CI) services, which can automatically build, for a project’s CI needs to change over time, as developers navigate test, and deploy software projects, are an invaluable asset in dis- these tradeoffs [2, 27]. Simply stated, one size does not fitall. tributed teams, increasing productivity and helping to maintain As a realization of the diversity of CI needs, the marketplace of code quality. Prior work has shown that CI pipelines can be sophisti- available CI tools in open-source software (OSS) is booming. While cated, and choosing and configuring a CI system involves tradeoffs. not long ago Jenkins and Travis CI were the only broadly used As CI technology matures, new CI tool offerings arise to meet the client-side and cloud-based CI services, respectively, there are now 1 distinct wants and needs of software teams, as they negotiate a 17 CI services that integrate with GitHub directly, and online 2 path through these tradeoffs, depending on their context. In this discussions are explicit about the need for tailored CI solutions. paper, we begin to uncover these nuances, and tell the story of To better support developers looking for bespoke CI solutions as open-source projects falling out of love with Travis, the earliest well as designers and builders of CI tools, in this paper we begin to and most popular cloud-based CI system.
- 
												  Download Downloadp-ISSN : 2443-2210 Jurnal Teknik Informatika dan Sistem Informasi e-ISSN : 2443-2229 Volume 7 Nomor 1 April 2021 Continuous Integration and Continuous Delivery Platform Development of Software Engineering and Software Project Management in Higher Education http://dx.doi.org/10.28932/jutisi.v7i1.3254 Riwayat Artikel Received: 5 Januari 2021 | Final Revision: 15 Maret 2021 | Accepted: 24 Maret 2021 Sendy Ferdian#1, Tjatur Kandaga#2, Andreas Widjaja#3, Hapnes Toba#4, Ronaldo Joshua#5, Julio Narabel#6 Faculty of Information Technology, Universitas Kristen Maranatha, Bandung, Indonesia [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] Abstract — We present a report of the development phase of a likely to be very diverse and composed of various tools, platform that aims to enhance the efficiency of software commercially or open-source technologies, therefore project management in higher education. The platform managing application delivery lifecycle is difficult because accommodates a strategy known as Continuous Integration and Continuous Delivery (CI/CD). The phase consists of of those complexities. Dealing with those daunting and several stages, followed by testing of the system and its complex tasks, one solution is using the DevOps [3] [4] deployment. For starters, the CI/CD platform will be deployed approach. for software projects of students in the Faculty of Information The relatively new emerging DevOps is a culture where Technology, Universitas Kristen Maranatha. The goal of this development, testing, and operations teams join together paper is to show a design of an effective platform for collaborating to deliver outcome results in a continuous and continuous integration and continuous delivery pipeline to accommodate source code compilation, code analysis, code effective way [2] [5].
- 
												  Technology Profile2021 Technology Profile https://azati.ai +375 (29) 6845855 Belarus, 31 K. Marks Street, Sections 5-6 Grodno, 230025 1 Table Of Contents TABLE OF CONTENTS page 01 DEPLOYMENT, BI & DATA page 09 WAREHOUSING GENERAL INFORMATION page 02 DATA SCIENCE & MACHINE LEARNING page 10 JAVA TECHNOLOGIES page 03 MONITORING TOOLS, PORTALS & SOLUTIONS, page 11 VERSION CONTROL RUBY & JAVASCRIPT TECHNOLOGIES page 04 VERSION CONTROL, SDK & OTHER TOOLS page 12 WEB & PHP TECHNOLOGIES page 05 OTHER TOOLS page 13 MOBILE DEVELOPMENT & DATABASES page 06 SOFTWARE TESTING & QA page 07 APPLICATION DEPLOYMENT page 08 2 General Information 01 PROGRAMMING LANGUAGES: 02 MARK-UP AND MODELING 05 SOFTWARE ARCHITECTURE PATTERNS: LANGUAGES: Java Representational State Transfer (REST/RESTful) JavaScript (ES5/ES6) HTML (4/5) Model-View-Controller (MVC) PHP XSLT Microservices TypeScript UML GraphQL PL/SQL Kotlin Smalltalk C 03 PROJECT MANAGEMENT C++ METHODOLOGIES: C# Agile (Kanban/SCRUM) Groovy Waterfall Delphi Behavior-driven development (BDD) Pascal Test-driven development (TDD) Python Feature-driven development (FDD) SQL Ruby R CoffeeScript 04 DEVELOPMENT APPROACHES: Perl Continuous Delivery (CD) Bash Continuous Integration (CI) Shell 3 Java Technologies 06 JAVA TECHNOLOGIES: 07 JAVA FRAMEWORKS: Apache POI Java (7/8/9) Spring Apache Wicket Java Servlet Spring Boot Apache CXF Java Database Connectivity (JDBC) Spring REST Apache Shiro Java REST Spring MVC Apache Camel Java Persistence API (JPA) Spring Data Java Message Service (JMS) Spring Security 08 JAVA LIBRARIES: JBoss Drools
- 
												  Neo4j.Rb Documentation Release 8.2.5Neo4j.rb Documentation Release 8.2.5 Chris Grigg, Brian Underwood Sep 25, 2017 Contents 1 Introduction 3 1.1 Terminology...............................................3 1.1.1 Neo4j..............................................3 1.1.2 Neo4j.rb.............................................4 1.2 Code Examples..............................................4 1.2.1 ActiveNode...........................................4 1.3 Setup...................................................4 2 Setup 5 2.1 Ruby on Rails..............................................5 2.1.1 Generating a new app......................................5 2.1.2 Adding the gem to an existing project.............................6 2.1.3 Rails configuration.......................................6 2.1.4 Configuring Faraday......................................7 2.2 Any Ruby Project............................................8 2.2.1 Connection...........................................8 2.3 What if I’m integrating with a pre-existing Neo4j database?......................9 2.4 Heroku..................................................9 3 Upgrade Guide 11 3.1 What has changed............................................ 11 3.2 The neo4j-core gem......................................... 11 3.2.1 The new API.......................................... 12 3.3 The neo4j gem............................................. 13 3.3.1 Sessions............................................. 13 3.3.2 server_db............................................ 13 3.3.3 Outside of Rails........................................
- 
												  Continuous Integration Build Failures in PracticeContinuous Integration Build Failures in Practice by Adriaan Labuschagne A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Mathematics in Computer Science Waterloo, Ontario, Canada, 2016 c Adriaan Labuschange 2016 I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. ii Abstract Automated software testing is a popular method of quality control that aims to detect bugs before software is released to the end user. Unfortunately, writing, maintaining, and executing automated test suites is expensive and not necessarily cost effective. To gain a better understanding of the return on investment and cost of automated testing, we studied the continuous integration build results of 61 open source projects. We found that 19.4% of build failures are resolved by a change to only test code. These failures do not find bugs in the code under test, but rather bugs in the test suites themselves and represent a maintenance cost. We also found that 12.8% of the build failures we studied are due to non-deterministic tests that can be difficult to debug. To gain a better understanding of what leads to test maintenance we manually inspected both valuable and costly tests and found a variety of reasons for test maintenance. Our empirical analysis of real projects quantifies the cost of maintaining test suites and shows that reducing maintenance costs could make automated testing much more affordable.