Kyle Niemeyer Oregon State University
Total Page:16
File Type:pdf, Size:1020Kb
NRG A modular course on developing research software Kyle Niemeyer Oregon State University ! [email protected] " ! @kyleniemeyer ! niemeyer-research-group.github.io Motivation The stats: • 2009: 91% scientists consider research software important/very important1 • UK academics in 2014: 90% use software in research; 70% of them would find research impractical without it2 • US postdocs in 2017: 95% / 63% same3 But: practically no graduate students / researchers receive formal training in this area. (Contrast with typical experimental methods courses) 1 JE Hannay, HP Langtangen, et al. SECSE 2009. Vancouver, BC, 1–8. https://doi.org/10.1109/SECSE.2009.5069155 2 S Hettrick et al. UK Research Software Survey 2014. (2015) https://doi.org/10.7488/ds/253 3 U Nangia & DS Katz, WSSSPE 5.1 (2017) https://doi.org/10.5281/zenodo.814102 2 Computational Science? software / computational skills : computational science :: experimental skills : experimental science 3 https://edition.cnn.com/travel/article/spectacular-fountains/index.html needs http://artisanplumbing.ca Who am I? • Assistant Professor of Mechanical Engineering, Oregon State University • Research focuses on simulations and modeling of reacting fluid flows and chemical kinetics • Associate Editor-in-Chief, Journal of Open Source Software 5 https://github.com/kyleniemeyer Existing Resources • Software Carpentry workshops have become a popular way to teach fundamental skills: Unix/ command line, version control with Git, and Python programming • Similarly, Data Carpentry workshops for data science • For domain scientists: MolSSI Software Summer School 7 More (Static) Resources PERSPECTIVE Good enough practices in scientific computing Greg Wilson1, Jennifer Bryan2, Karen Cranston3, Justin Kitzes4, Lex Nederbragt5, Tracy K. Teal6 1 Software Carpentry Foundation, Austin, Texas, United States of America, 2 RStudio and Department of Statistics, University of British Columbia, Vancouver, British Columbia, Canada, 3 Department of Biology, Duke University, Durham, North Carolina, United States of America, 4 Energy and Resources Group, https://doi.org/10.1371/University of California, Berkeley, Berkeley journal.pcbi.1005510, California, United States of America, 5 Centre for Ecological and Evolutionary Synthesis, University of Oslo, Oslo, Norway, 6 Data Carpentry, Davis, California, United States of America These authors contributed equally to this work. * [email protected] Author summary Computers are now essential in all branches of science, but most researchers are never taught the equivalent of basic lab skills for research computing. As a result, data can get lost, analyses can take much longer than necessary, and researchers are limited in how effectively they can work with software and data. Computing workflows need to follow a1111111111 thesame practices aslabprojects and notebooks, withorganizeddata, documented steps, a1111111111 and the project structured for reproducibility, but researchers new to computing often a1111111111 don’t know where to start. This paper presents a set of good computing practices that a1111111111 every researchercan adopt, regardlessoftheir current levelof computational skill. These a1111111111 practices, which encompassdata management, programming, collaborating with col- leagues, organizing projects, tracking work, and writing manuscripts, are drawn from a wide varietyof published sources from our daily livesand from our work with volunteer organizations thathavedeliveredworkshopsto over 11,000people since 2010. Citation: Wilson G, Bryan J, Cranston K, Kitzes J, Nederbragt L, Teal TK(2017) Good enough practices in scientific computing. PLoS Comput Biol 13(6): e1005510.https://doi.org/10.1371/ Overview journal.pcbi.1005510 http://physics.codes We present a set of computingtools and techniquesthat every researcher can and should con- Editor: Francis Ouellette,Ontario Institute for sider adopting.Theserecommendationssynthesizeinspiration from our own work, from the Cancer Research, CANADA experiences of the thousands of people who have taken part in Software Carpentry and Data Published: June 22, 2017 Carpentry workshops over the past 6 years, and from a variety of other guides. Our recom- Copyright: 2017 Wilsonet al. This is an openmendations areaimedspecifically at people who arenew to research computing. access article distributed under the terms of the Creative Commons Attribution License, which Introduction permits unrestricted use,distribution, and reproductionhttps://doi.org/10.12688/f1000research.11407.1in any medium,provided the original Three years ago, a group of researchers involved in Software Carpentry and Data Carpentry Big influence on my class! author and source arecredited. wrote a paper called "Best Practices for Scientific Computing" [1]. That paper provided recom- Funding: The authors received no specific funding mendations for people who were already doing significant amounts of computation in their 8for this work. research. However, as computing has become an essential part of science for all researchers, Competing interests: The authors have declared there is a larger group of people new to scientific computing, and the question then becomes, that no competing interests exist. "where to start?" PLOS Computational Biology | https://doi.org/10.1371/journal.pcbi.1005510 June 22, 2017 1 / 20 My Solution: A 10-week (primarily) graduate course to fill this gap, using existing resources as inspiration/building blocks. 9 Course Philosophy • Main goal: teach students best practices and practical skills to develop useful, tested, and (potentially) sustainable research software. • Rather than give standalone assignments, all assigned work relates to a quarter-long project, which is intended to support their thesis research. • Also try to instill open science as the default—relatively early in students’ careers. • Classes mix a typical presentation-based lecture approach with interactive/hands-on work and discussion. 10 Topics Taught • Version control with Git • Working with files • Remote and collaborative version • Building command-line interfaces control with GitHub • Packaging and distributing your • Licensing and copyright Python software • Testing software, test coverage, • Introduction to parallel and continuous integration programming • Structuring Python packages • Open science, software citation, • Writing documentation and reproducibility best practices • Objects and classes in Python • Documentation with Sphinx 11 Example: module on testing https://softwaredevengresearch.github.io/lecture-packaging-testing/ Project • At beginning of class, students propose a project for the term connected to their thesis research. • This is submitted as a PR to the course_projects repo 29 Project Stages Project proposal Create project repo & fork to account Create first module with tests, submit as PR for review Configure continuous integration Create command-line interface Package software for distribution Configure Zenodo integration Write report and cite software 30 Results from 2018–2019 • Offered in Spring 2018 and Spring 2019 as “ME 599: Software Development for Engineering Research”. • (However, nothing specifically Mechanical Engineering —or even Engineering—about it.) • Enrollment: 17 students in 2018, 9 students in 2019. 31 Project Topics • designing detonation tubes • agent-learning for • calibrating blackbody autonomous path finding • extracting features with infrared cameras machine learning from • generating input for • simulating transient heat nuclear physics Monte-Carlo radiation transfer in a microchannel simulations transport • biomass cookstove • interfacing with an 8- • calculating solar-energy optimization tool channel digital pulse terms based on location • projectile testing prediction processor board analyzing radioxenon • transit system anomaly simulating and analyzing a spectra • • detection formula SAE vehicle • calculating deep-learning engine Automatic functional layers for multi-agent • design representation • optimizing and analyzing reinforcement learners wind-farm layouts • Heat exchanger analysis • analyzing solvent • analyzing spin stabilization extraction kinetics • Influx modeling for of solid rocket motors multiphase flows • simulating rapid • nodal quasi-diffusion compression machine • Control package for liquid rocket engine test stand solver for nuclear fission experiments32 Student composition 1st-year MS/PhD student 2nd-year MS/PhD student ≥3rd-year PhD student Undergraduate MEng 7% 4% 36% 32% 33 21% Whether students took Python class yes no 41% 59% 34 Comfort with command line I've used it, though I'm not exactly comfortable with it. I'm comfortable working on the command line, but not an expert What is the command line? I'm a command-line ninja � 4% 37% 59% 35 Student Feedback • In both cases, students rate the course as a whole highly • Comments suggest that all graduate students working with software/programming need this material • Some feelings of being too advanced for their experience level 36 Post-Course Work • In both classes so far, roughly half the students indicated that they plan to use and/or develop their packages further for their research • One ultimate goal: encourage submission to JOSS— one under review, and another to be submitted! 37 JOSS: Journal of Open Source Software • http://joss.theoj.org/ • Developer-friendly journal for research software packages • Affiliate of Open Source Initiative • Open access, no fees