A Survey of the Practice of Computational Science
Total Page:16
File Type:pdf, Size:1020Kb
A Survey of the Practice of Computational Science Prakash Prabhu Thomas B. Jablin Arun Raman Yun Zhang Jialu Huang Hanjun Kim Nick P. Johnson Feng Liu Soumyadeep Ghosh Stephen Beard Taewook Oh Matthew Zoufaly David Walker David I. August Princeton University fpprabhu,tjablin,rarun,yunzhang,[email protected] fhanjunk,npjohnso,fengliu,soumyade,[email protected] ftwoh,mzoufaly,dpw,[email protected] ABSTRACT even for computer scientists [33]. Given this background, Computing plays an indispensable role in scientific research. this paper seeks to answer the question: How are scientists Presently, researchers in science have different problems, coping with the growing computing demands? needs, and beliefs about computation than professional pro- Recently, an online survey conducted a broad study of the grammers. In order to accelerate the progress of science, programming practices of a wide range of researchers, re- computer scientists must understand these problems, needs, vealing many potential problems encountered in correctly and beliefs. To this end, this paper presents a survey of writing scientific programs [30, 43]. Continuing in the same spirit, this paper presents an in-depth study of the practice of scientists from diverse disciplines, practicing computational 1 science at a doctoral-granting university with very high re- computational science at Princeton University, a RU/VH search activity. The survey covers many things, among institution. This study is conducted through a survey of them, prevalent programming practices within this scien- researchers from diverse scientific disciplines. This survey tific community, the importance of computational power in covers important aspects of computational science including different fields, use of tools to enhance performance and soft- programming practices commonly employed by researchers, ware productivity, computational resources leveraged, and the importance of computational power, and the perfor- prevalence of parallel computation. The results reveal sev- mance enhancing strategies in use. The results are presented eral patterns that suggest interesting avenues to bridge the in the context of the university's prevailing computational gap between scientific researchers and programming tools environment, providing insights into diverse computational developers. practices followed within the institution. The analysis of survey results reveals several patterns that suggest various areas of improvement. In contrast to the 1. INTRODUCTION popular view that scientists use only numerical algorithms Computational science [53], a multidisciplinary field en- written in MATLAB and FORTRAN, the survey discov- compassing various aspects of science, engineering, and com- ered that C, C++, and Python were popular among many putational mathematics, is increasingly being seen as the scientists and there is a growing need for non-numerical al- \third approach" [23], after theory and experiment, to an- gorithms. Despite the availability of clusters and large-scale swering fundamental scientific questions. Researchers prac- shared memory systems within the University and a gen- ticing computational science typically face two concerns com- eral desire for higher performance through parallel compu- peting for their time. Primarily, they must concentrate on tation, a substantial portion of scientific computation still their scientific problem by forming hypotheses, developing takes place on scientists' personal computers. Although and evaluating models, performing experiments, and collect- many scientists use shared-memory multicore desktops and ing data. At the same time, they also have to spend con- not clusters for scientific computation, knowledge of shared- siderable time converting their models into programs and memory parallelization techniques in the scientific commu- testing, debugging, and optimizing those programs. nity is low. Furthermore, the survey determined that scien- In the past two decades, there has been an exponential tists frequently do not leverage performance analysis tools increase in the amount of data generated and computation to track down the causes of poor performance and conse- performed within many scientific disciplines [53, 55], signi- quently \optimize" cold-code while ignoring a computation's fying an increasing need for high performance computing. real bottlenecks. The contributions of this paper are: Writing correct and high performance programs is difficult • An in-depth survey of the practice of computational sci- ence at a RU/VH institution. The survey was conducted Permission to make digital or hard copies of all or part of this work for through personal interviews with 114 researchers ran- personal or classroom use is granted without fee provided that copies are domly selected from diverse fields of natural sciences, en- not made or distributed for profit or commercial advantage and that copies gineering, interdisciplinary sciences, and social sciences. bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific • An analysis of survey results that suggests several ar- permission and/or a fee. 1 Copyright is held by the author/owner(s). RU/VH stands for \very high research activity doctoral- SC’11, November 12–18, 2011, Seattle, Washington, USA. granting university", as classified by the Carnegie Founda- ACM 978-1-4503-0771-0/11/11. tion [11]. 1 Field Discipline Count the results of the survey are presented suitably categorized Natural Astrophysics 3 Sciences Atmospheric and Oceanic Sciences 2 into the three themes mentioned above. Each theme is intro- Chemistry 5 duced by posing a broad set of questions, and then answering Ecology and Evolutionary Biology 5 these questions through a general set of patterns observed Physics 5 Geosciences 6 during the survey along with data to substantiate each ob- Molecular Biology 4 servation. To highlight these key patterns, and other central Plasma Physics 2 ideas or conclusions that appear later in the paper, we set Engineering Chemical 7 Civil and Environmental 5 them apart from the main text as an italicized comment. Mechanical and Aerospace 11 Electrical 12 3.1 Computing Environment Operations Research and Financial 5 Interdisciplinary Music 4 Researchers at Princeton University are heavily supported Sciences Applied and Computational Math 2 in terms of computational resources and expertise. The Computational Biology 4 Princeton Institute for Computational Science and Engi- Neuroscience/Psychology 13 neering (PICSciE) [13] aims to foster the computational sci- Social Economics 10 Sciences Sociology 5 ences by providing computational resources as well as the Politics 4 experience necessary to capitalize on those resources. At Total 114 the time of writing, these resources include the larger cluster Table 1: Subject population distribution hardware available through the Terascale Infrastructure for eas of improvement, both in terms of practices employed Groundbreaking Research in Engineering and Science (TI- by scientific researchers and future research directions for GRESS) [10]. TIGRESS is a high performance computing programming tools developers. center that is an outcome of collaboration between PICSciE, various research centers [8, 9, 12, 14], and a number of aca- demic departments and faculty members. 2. SURVEY METHODOLOGY TIGRESS offers four Beowulf clusters (with 768, 768, The survey covers a set of 114 randomly selected researchers 1024, and 3584 processors), and a 192 processor NUMA from diverse fields of science and engineering at Prince- machine with shared memory and 1 petabyte of network ton University. The pool of survey candidates includes all attached storage. These clusters serve the computational graduate students, post doctoral associates, and research needs of 192 researchers. Administrators at TIGRESS esti- staff in various scientific disciplines at Princeton University. mate that their systems are at 80% utilization. Addition- An email soliciting participation in the survey was initially ally, PICSciE offers courses, seminars and colloquia to aid sent to randomly selected candidates from the university the computational sciences. Since 2003, PICSciE has offered database. The email mentioned \use of computation in re- mini-courses on data visualization, scientific programming in search" as a criterion for participation. After a candidate Python, FORTRAN, MATLAB, Maple, Perl and other lan- replied indicating interest in the survey, an interview was guages, technologies for parallel computing (including MPI conducted by at least two of the authors, exploring, in depth, and OpenMP), as well as courses on optimization and debug- the various aspects of scientific computing related to the can- ging parallel programs. Recently, PICSciE began offering a didate's research. course on scientific computing. PICSciE also offers program- Table 1 shows the distribution of subjects across different ming support for troubleshooting malfunctioning programs, scientific fields. In this paper, the word \scientist" is used parallelizing existing serial codes, and tuning software for in a broad sense, to cover researchers from natural sciences, maximum performance. engineering, interdisciplinary sciences, and social sciences. A total of 20 disciplines were represented. Of the 114 sub- 3.2 Programming Practices jects, 32 were from the natural sciences, 40 from engineer- Representative questions concerning this theme included: ing, 23