What Is the Price of Open-Source Software? 8 9 Anna I
Total Page:16
File Type:pdf, Size:1020Kb
The Journal of Physical Chemistry Letters This document is confidential and is proprietary to the American Chemical Society and its authors. Do not copy or disclose without written permission. If you have received this item in error, notify the sender and delete all copies. What is the Price of Open -Source Software? Journal: The Journal of Physical Chemistry Letters Manuscript ID: jz-2015-01258h Manuscript Type: Viewpoint Date Submitted by the Author: 13-Jun-2015 Complete List of Authors: Krylov, Anna; University of Southern California, Dept. of Chemistry Herbert, John; The Ohio State University, Chemistry Furche, Filipp; University of California, Irvine, Chemistry Head-Gordon, Martin; University of California, Berkeley, Chemistry Knowles, Peter; Cardiff University, School of Chemistry Lindh, Roland; Chemistry - Ångström Laboratory, Theoretical Chemistry Manby, Frederick; University of Bristol, School of Chemistry Pulay, Peter; University of Arkansas, Chemistry and Biochemistry Skylaris, Chris-Kriton; University of Southampton, Werner, Hans-Joachim; Universitaet Stuttgart, Institut fuer Theoretische Chemie ACS Paragon Plus Environment Page 1 of 6 The Journal of Physical Chemistry Letters 1 2 3 4 5 6 7 What is the price of open-source software? 8 9 Anna I. Krylov,1∗ John M. Herbert,2† Filipp Furche,3 10 Martin Head-Gordon,4 Peter J. Knowles,5 Roland Lindh,6 Frederick R. Manby,7 11 8 9 10 12 Peter Pulay, Chris-Kriton Skylaris, and Hans-Joachim Werner 13 14 1Dept. of Chemistry, University of Southern California, Los Angeles, California, USA 15 2Dept. of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio, USA 16 3Dept. of Chemistry, University of California, Irvine, California, USA 17 4 18 Dept. of Chemistry, University of California, Berkeley, California, USA 5 19 School of Chemistry, Cardiff University, Cardiff, UK 20 6Dept. of Chemistry–Angstr¨omLaboratory,˚ Uppsala University, Uppsala, Sweden 21 7Center for Computational Chemistry, School of Chemistry, University of Bristol, Bristol, UK 22 8Dept. of Chemistry and Biochemistry, University of Arkansas, Fayetteville, Arkansas, USA 23 9 24 School of Chemistry, University of Southampton, Highfield, Southampton, UK 25 10Institut f¨urTheoretische Chemie, Universit¨atStuttgart, Stuttgart, Germany 26 27 June 12, 2015 28 29 30 The notion that all scientific software should be open-source and free has been actively promoted in 31 recent years, mostly from the top down via mandates from funding agencies [1] but occasionally from the 32 bottom up, as exemplified by a recent Viewpoint in this journal. [2] A commonly articulated rationale is that 33 the results of scientific research funded by government grants should be free for society and that the scientific 34 community benefits from free access. The purpose of this Viewpoint is to examine the consequences of these 35 opinions. 36 What is scientific software? Modern computational chemistry software is an extremely complex 37 product based on advanced scientific ideas (models and theories) and sophisticated algorithms that transform 38 these ideas from equations into useful tools. The development of practical software that can be used by 39 non-experts to solve contemporary research problems requires considerable technical effort to produce and 40 maintain robust, efficient, and validated code. Unlike the development of, e.g., a smart-phone app, where the 41 code base is small [3] and a relatively large community can easily write extensions and add-ons, production 42 of scientific software involves the curation of millions of lines of source code. The complexity of this code 43 demands long-term user and developer support to maintain its integrity and performance while keeping 44 45 up with new computer architectures, fixing bugs, and adding features. Recognizing the importance of these ideas, various funding agencies in the U.S. have made “sustainable software” a key priority in the distribution 46 [1] 47 of research support. Sustainability is a critical goal, but one that can be realized in various ways. 48 Good software is important to science. Computational chemistry software is an essential scientific 49 instrument that facilitates discovery and innovation far beyond the laboratories in which it is created, an [4] 50 achievement that was recognized by the 1998 and 2013 Nobel Prizes in Chemistry. Focusing on quantum 51 chemistry software in this Viewpoint, we note that today any chemist can (with very little training) use nu- 52 merous quantum chemistry programs as teaching and research tools that aid in the design and interpretation 53 of experiments. 54 ∗[email protected] 55 †[email protected] 56 57 58 1 59 60 ACS Paragon Plus Environment The Journal of Physical Chemistry Letters Page 2 of 6 1 2 3 A software package should be more than just a tool for end users, however; it should also be a platform 4 to develop and test new models and algorithms. Maintaining a code base requires extensive validation, and 5 given the complexity of modern computational methods, even testing of “pilot code” or a “proof-of-principle” 6 implementation requires access to basic software infrastructure, e.g., an integrals library, a self-consistent 7 field procedure, efficient I/O and memory management, tools for manipulating tensors, etc. Modularity 8 is a laudable goal, but in reality “interoperability” often comes at the expense of performance. In high- 9 performance codes, the aforementioned components are tightly interwoven, to the extent that expert help is 10 11 often required to modify key components or to develop non-standard interfaces to them. As such, the ability 12 to innovate along either applied or theoretical lines depends crucially on the quality of the software and the availability of documentation and expert support. 13 [5] [6] 14 As examples, consider two widely-used electronic structure programs, Q-Chem and Molpro. These ∼ . ∼ . 15 codes consist of 5 5 and 2 5 million lines of source code, respectively, written in multiple languages and 16 each in continuous development over several decades. Q-Chem incorporates scientific advances reported in 17 more than 300 peer-reviewed scientific publications, while methods implemented for the first time in Molpro 18 have led to 20 high-impact papers that have each been cited over 300 times. Neither code is static: more 19 than 70 scientists are actively contributing to Molpro, and the Q-Chem developer base numbers more than 20 100. Such agile innovation comes at a price, however. Significant effort is required to keep the code robust, 21 efficient, and sound, and to provide the documentation that ensures the usability of new methods and the 22 extensibility of older ones. 23 Software from academia is often developed with an emphasis on ideas rather than implementation, fed by 24 the need for timely peer-reviewed journal publications that provide ongoing grant support and future jobs 25 for graduate students. To bring new ideas to the production level, with software that is accessible to (and 26 useful for) the broader scientific community, contributions from expert programmers are required. These 27 technical tasks usually cannot—and generally should not—be conducted by graduate students or postdocs, 28 who should instead be focused on science and innovation. To this end, Q-Chem employs four scientific 29 programmers. Other quantum chemistry codes (e.g., Molpro, [6] Turbomole, [7] Jaguar, [8] Molcas, [9] 30 PQS, [10] and ONETEP [11]) face the same challenges and adopt similar models to ensure sustainability. 31 It is important to distinguish these academically-led software ventures from purely commercial endeavors. 32 The large majority of the code in a package like Q-Chem is funded by the government, either through grants 33 to academic groups or, in some cases, through technology grants to the company itself. The role of the 34 company programmers is to enable sustainability through bug fixes, user support, release management, and 35 the addition of features that academic developers either cannot or will not add themselves. Programmers 36 employed by the company place emphasis on functionality, robustness, and performance, more so than 37 scientific innovation. They are directly addressing the “reproducibility problem”. [2] Sales revenue cannot 38 support the entire development cost of an academic code, but it contributes critically to its sustainability. 39 The cost that the customer pays for a code like Q-Chem reflects this funding model: it is vastly lower than 40 the development cost, particularly for academic customers but also for industry. It primarily reflects the 41 sustainability cost. 42 Software is not data. In his Viewpoint, [2] Gezelter argues that both software and data should be 43 open, yet it is important not to conflate the two. Software is not data, and simply because it is feasible to 44 put software on the internet does not imply that it should be posted. Software is a product that contains an 45 intellectual component (models and algorithms) but owes its existence to additional technical efforts. Such 46 efforts include implementation of minor but useful (or requested) features that are not publishable in the 47 peer-reviewed literature. This is not to say that details should be withheld as proprietary information. Just 48 as models and algorithms should be described in full detail in scientific publications, so too should imple- 49 mentation details be specified, along with performance metrics (timings and scaling data) and benchmarks 50 (energies and other computed properties). Nevertheless, the software itself is a product, not a scientific find- 51 52 ing, more akin to, say, an NMR spectrometer—a sophisticated instrument—than to the spectra produced 53 by that instrument. 54 Consider an analogy from the field of photovoltaics. Scientific findings concerning the mechanistic details 55 of charge generation and exciton propagation in a given material are results that merit discussion in the 56 57 58 2 59 60 ACS Paragon Plus Environment Page 3 of 6 The Journal of Physical Chemistry Letters 1 2 3 peer-reviewed literature.