An Effective Git and Org-Mode Based Workflow for Reproducible Research
Total Page:16
File Type:pdf, Size:1020Kb
An Effective Git And Org-Mode Based Workflow For Reproducible Research Luka Stanisic, Arnaud Legrand and Vincent Danjean fi[email protected] CNRS/Inria/Univ. of Grenoble, Grenoble, France ABSTRACT is no reason why this should not be applied to computer In this paper we address the question of developing a science as well. lightweight and effective workflow for conducting experimen- Hence, a new movement promoting the development of tal research on modern parallel computer systems in a repro- reproducible research tools and practices has emerged, espe- ducible way. Our approach builds on two well-known tools cially for computational sciences. Such tools generally focus (Git and Org-mode) and enables to address, at least par- on replicability of data analysis [14]. Although high perfor- tially, issues such as running experiments, provenance track- mance computing or distributed computing experiments in- ing, experimental setup reconstruction or replicable analysis. volve running complex codes, they do not focus on execution We have been using such a methodology for two years now results but rather on the time taken to run a program and on and it enabled us to recently publish a fully reproducible how the machine resources were used. These requirements article [12]. To fully demonstrate the effectiveness of our call for different workflows and tools, since such experiments proposal, we have opened our two year laboratory notebook are not replicable by essence. Nevertheless in such cases, re- with all the attached experimental data. This notebook and searchers should still at least aim at full reproducibility of the underlying Git revision control system enable to illus- their work. trate and to better understand the workflow we used. There are many existing solutions partially addressing these issues, some of which we present in Section3. However, none of them was completely satisfying the needs and con- 1. INTRODUCTION straints of our experimental context. Therefore, we decided In the last decades, both hardware and software of modern to develop an alternative approach based on two well-known computers have become increasingly complex. Multi-core and widely-used tools: Git and Org-mode. The contribu- architectures comprising several accelerators (GPUs or the tions of our work are the following: Intel Xeon Phi) and interconnected by high-speed networks have become mainstream in the field of High-Performance. • We propose and describe in Section 4.1 a Git branching Obtaining the maximum performance of such heterogeneous model for managing experimental results synchronized machines requires to rely on combinations of complex soft- with the code that generated them. We used such ware stacks ranging from the compilers to architecture spe- branching model for two years and have thus identified cific libraries or languages (e.g., CUDA, OpenCL, MPI) and a few typical branching and merging operations, which complex dynamic runtimes (StarPU, Charm++, OmpSs, we describe in Section 4.3 etc.) [1]. Dynamic and opportunistic optimizations are done at every level and even experts have troubles fully under- • Such branching model eases provenance tracking, ex- standing the performance of such complex systems. Such periments reproduction and data accessibility. How- machines can no longer be considered as deterministic, espe- ever, it does not address issues such as documentation cially when it comes to measuring execution times of paral- of the experimental process and conclusions about the lel multi-threaded programs. Controlling every relevant so- results, nor the acquisition of meta-data about the ex- phisticated component during such measurements is almost perimental environment. To this end, in Section 4.2 impossible even in single processor cases [8], making the we complete our branching workflow with an intensive full reproduction of experiments extremely difficult. Con- use of Org-mode [11], which enables us to manage and sequently, studying computers has become very similar to keep in sync experimental results and meta-data. It studying a natural phenomena and it should thus use the also provides literate programming, which is very con- same principles as other scientific fields that had them de- venient in a laboratory notebook and eases the edition fined centuries ago. Although many conclusions are com- of reproducible articles. We explain in Section 4.3 how monly based on experimental results in this domain of com- the laboratory notebook and the Git branching can be puter science, articles generally poorly detail the experimen- nicely integrated to ease the set up of a reproducible tal protocol. Left with insufficient information, readers have article. generally troubles reproducing the study and possibly build upon it. Yet, as reminded by Drummond [4], reproducibility • Through the whole Section4, we demonstrate the of experimental results is the hallmark of science and there effectiveness of this approach by providing examples and opening our two years old Git repository at http: Copyright is held by the authors //gforge.inria.fr/projects/starpu-simgrid. We illustrate several points in the discussion by pointing directly to coherent, well-structured and reproducible way that allows specific commits. anyone to inspect how they were performed. • Although we open our whole Git repository for illus- 2.2 Case Study #2: Studying the Performance tration purposes, this is not required by our workflow. of CPU Caches There may be situations where researchers may want to share only parts of their work. We discuss in Sec- Another use case that required a particular care is the tion5 various code and experimental data publishing study of CPU cache performance on various Intel and ARM options that can be used within such a workflow. micro-architectures [13]. The developed source code was quite simple, containing only a few files, but there were nu- • Finally, we discuss in Section6 how the proposed merous input, compilation and operating system parameters methodology helped us conduct two very different to be taken into account. Probably the most critical part of studies in the High Performance Computing (HPC) this study was the environment setup, which proved to be domain. We also report limits of our approach, to- unstable, and thus, responsible for many unexpected phe- gether with some open questions. nomena. Therefore, it was essential to capture, understand and easily compare as much meta-data as possible. 2. MOTIVATION AND USE CASE DE- Limited resources (both in terms of memory, CPU power and disk space) and software packages restricted availability SCRIPTION on ARM processors have motivated some design choices of Our research is centered on the modeling of the perfor- our methodology. In particular, this required our solution mance of modern computer systems. To validate our ap- to be lightweight and with minimal dependencies. Although proach and models, we have to perform numerous measure- this research topic has not led to a reproducible article yet, ments on a wide variety of machines, some of which being we used the proposed workflow and can still track the whole sometimes not even dedicated. In such context, a presum- history of these experiments. ably minor misunderstanding or inaccuracy about some pa- rameters at small scale can result in a totally different behav- ior at the macroscopic level [8]. It is thus crucial to carefully 3. RELATED WORK collect all the useful meta-data and to use well-planed ex- In the last several years, the field of reproducibility re- periment designs along with coherent analyses, such details search tools has been very active, various alternatives emerg- being essential to the reproduction of experimental results. ing to address diverse problematic. However, only few of Our goal was however not to implement a new tool, but them are appropriate for running experiments and mea- rather to find a good combination of already existing ones suring execution time on large, distributed, hybrid systems that would allow a painless and effective daily usage. We comprising prototype hardware and software. The ultimate were driven by purely pragmatic motives, as we wanted to goal of reproducible research is to bridge the current gap keep our workflow as simple and comprehensible as possible between the authors and the article readers (see Figure1) while offering the best possible level of confidence in the by providing as much material as possible on the scientist reproducibility of our results. choices and employed artifacts. We believe that opening a In the rest of this section, we present two research topics laboratory notebook is one step in that direction since not we worked on, along with their specifics and needs regard- only the positive results are shared in the final article but ing experimentation workflow. These needs and constraints also all the negative results and failures that occurred during motivated several design choices of the workflow we present the process. in Section4. There are successful initiatives in different computer sci- ence fields (e.g., DETER [7] in cybersecurity experimenta- 2.1 Case Study #1: Modeling and Simulating tion or PlanetLab in the peer-to-peer community) that pro- Dynamic Task-Based Runtimes vide good tools for their users. However, when conduct- Our first use case aims at providing accurate perfor- ing experiments and analysis in the HPC domain, several mance predictions of dense linear algebra kernels on hybrid specific aspects need to be considered. We start by de- architectures (multi-core architectures comprising multiple scribing the ones related to the experimental platform setup GPUs). We modeled the StarPU runtime [1] on top of and configuration. Due to the specific nature of our re- the SimGrid simulator [3]. To evaluate the quality of our search projects (prototype platforms with limited control modeling, we had to conduct experiments on hybrid proto- and administration permissions), we have not much ad- type hardware whose software stack (e.g., CUDA or MKL dressed these issues ourselves although we think that they libraries) could evolve along time.