SHIFTING MODALITIES of USE in the XSEDE PROJECT Richard Knepper Submitted to the Faculty of the University Graduate School in Pa

SHIFTING MODALITIES OF USE IN THE XSEDE PROJECT Richard Knepper Submitted to the faculty of the University Graduate School in partial fulfillment of the requirements for the degree Doctor of Philosophy in the School of Informatics and Computing Indiana University May, 2017 ProQuest Number:10277697 All rights reserved INFORMATION TO ALL USERS The quality of this reproduction is dependent upon the quality of the copy submitted. In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted. Also, if material had to be removed, a note will indicate the deletion. ProQuest 10277697 Published by ProQuest LLC ( 2017). Copyright of the Dissertation is held by the Author. All rights reserved. This work is protected against unauthorized copying under Title 17, United States Code Microform Edition © ProQuest LLC. ProQuest LLC. 789 East Eisenhower Parkway P.O. Box 1346 Ann Arbor, MI 48106 - 1346 Accepted by the Graduate Faculty, Indiana University, in partial fulfillment of the requirements for the degree of Doctor of Philosophy. Doctoral Committee: Nathan Ensmenger, PhD Christena Nippert-Eng, PhD Katy Borner,¨ PhD Kosali Simon, PhD April 25, 2017 ii For Ula, who always championed me through setbacks and successes. For Dick, who always provided the hardest questions. For David, who always asked “Now, how do we get you out of here?” iii Acknowledgements This dissertation is the product of a particular set of experiences in time, which were afforded to me by my concurrent role as a professional IT manager working in the XSEDE project and as a student curious about organizations, technology, and scientists. I could not have had the access, frankness, and trust that I was able to achieve without the openness and interest of the members of the XSEDE project, its staff, management, and users. I especially thank XSEDE principal investigator John Towns, who took the time to read associated papers and discuss the project with me while in the middle of running a multimillion-dollar project distributed across fifteen partner institutions, and XSEDE’s NSF Program Director Rudolf Eigenmann, who provided me a view of the NSF processes behind the organization. Data for the quantitative investigations that made up part of this project were made available thanks to Dave Hart of the Na- tional Center for Atmospheric Research for providing publications information from the XSEDE portal and discussing utilization and publication information, and also to Robert Deleon and Tom Furlani of the University of Buffalo for providing XSEDE project and usage data. As an interview respondent, Dr. Craig Stewart provided excellent material as well as ad- vice during the preparation of this research. This project would not have been possible to bring to completion without its research committee. Dr. Nathan Ensmenger provided unwavering support and assurance throughout the process of developing the area of inquiry and the points of interest which came out of my conversations with informants. Dr. Christena Nippert-Eng was a source of guidance on matters ethnological as well as in the drafting stages of this dissertation. Dr. Katy Borner¨ informed and instructed on the quantitative and network science portions of the research. Dr. Kosali Simon provided insight into policy matters and the policymaker perspective. The work below would not have been possible without the guidance and support of the late David Hakken, who provided intellectual guidance and questions which provided the foundations of inquiry. David was a credit to his field and always supported those investigators who looked at the unusual, the failures, and the imbalances in our world. iv Richard Knepper Shifting Modalities of Use in the XSEDE Project The National Science Foundation’s Extreme Science and Engineering Discovery Environment (XSEDE) provides a broad range of computational resources for researchers in the United States. While the model of computational service provision that XSEDE provides suits a traditional high performance computing model well, there are some indications that computational research is moving to new modalities of use and new virtual organizations. This ethnography of the users, management, and staff of XSEDE explores how a large virtual organization is charged with providing both next-generation computational resources and reaching out to a broader community which is just beginning to embrace computational techniques. The challenges of balancing the demands of a diverse set of stakeholders, the guidance of changing leadership at the NSF, and managing the development and professionalization of the cyberinfrastructure workforce create a number of tensions and dynamics. This research project identifies how these dynamics arise and are resolved, reflecting on the science policy initiatives which define and constrain the environment. In addition to ethnographic description of the project and interview analysis, social network analyses of the XSEDE user base and research project usage information are conducted, showing a collection of traditional HPC users and support for research from traditional research oriented institutions, rather than extending to new, broader communities. Nathan Ensmenger, PhD Christena Nippert-Eng, PhD Katy Borner,¨ PhD Kosali Simon, PhD v Contents Dedication iii Acknowledgements iv Abstract v 1 Introduction 1 1.1 Central Questions ......................... 6 1.2 Motivation .............................. 10 1.2.1 Informing studies of scientific collaboration ...... 11 1.2.2 Understanding relationships between infrastructure and research ........................... 13 1.2.3 Informing science policy and NSF initiatives ...... 18 1.3 Key Concepts ............................ 19 1.3.1 Cyberinfrastructure and the CI community ...... 20 1.3.2 Government-sponsored cyberinfrastructure environment 32 2 Literature Review 40 2.1 Studies of Cyberinfrastructure .................. 40 2.1.1 The roots of cyberinfrastructure within Big Science . 40 2.1.2 The development of collaborative cyberinfrastructure 53 2.1.3 Literature on cyberinfrastructure usage and scientific production ......................... 59 2.2 Science Policy ........................... 62 2.2.1 Government funding for basic research . ....... 64 2.2.2 Making Grants Measurable ................ 71 2.2.3 The competitive element .................. 75 2.2.4 Public Management Networks for Service Delivery . 77 vi 3 NSF-funded cyberinfrastructure prior to XSEDE 87 3.1 The NSF Supercomputing Centers Program . ....... 88 3.2 The TeraGrid ............................ 91 3.2.1 TeraGrid innovations ................... 93 3.2.2 TeraGrid Metrics and Incentives ............. 100 3.2.3 TeraGrid tensions and dynamics ............. 104 4 Research Methods 112 4.1 Investigator Statement ...................... 113 4.2 Qualitative Analyses ........................ 116 4.2.1 Document Analysis .................... 117 4.2.2 Interviews .......................... 119 4.2.3 Ethnographic Observations ................ 125 4.3 Quantitative Analyses ....................... 129 4.3.1 Data Gathering ....................... 130 4.3.2 Bibliometric Analysis ................... 133 4.3.3 Analysis of usage data ................... 134 4.4 Institutional Review ........................ 135 5 Findings 136 5.1 From TeraGrid to XSEDE ..................... 137 5.1.1 The XD solicitation ..................... 137 5.1.2 The “shotgun wedding” .................. 140 5.1.3 XSEDE operations begin: the move towards service . 144 5.2 Understanding XSEDE users ................... 155 5.2.1 Changing fields of science in XSEDE . ....... 155 5.2.2 The “Long Tail” of XSEDE ................. 159 5.2.3 How users leverage XSEDE ................ 167 5.2.4 Researcher tactics ..................... 170 5.3 Understanding XSEDE and the CI community . ....... 175 5.3.1 Cyberinfrastructure Crises ................ 178 5.3.2 Adaptations in the Virtual Organization . ....... 181 6 Conclusions 185 6.1 Cyberinfrastructure trends .................... 186 6.2 XSEDE 2 and what comes after ................. 189 6.3 Science Policy lessons ....................... 193 6.4 Areas for further research .................... 195 vii Bibliography 197 A Interview Questions 215 Curriculum Vitae viii Chapter 1 Introduction Scientific research requires tools and instruments in order to conduct ex- periments, make measurements, and calculate results. As scientific studies have moved from individual to group efforts and technological change has moved scientific work from the laboratory to the computer, the pri- mary instrument of inquiry has become the computer. Joint scientific work which is mediated by the Internet is increasingly the normal form of scientific inquiry [59]. With the development of these “collaboratories” as described by Finholt and Olsen [61], the work of scientists increasingly becomes more collaborative, both in terms of direct collaboration with co-authors as well as indirectly, with other researchers who develop scientific software, technicians and engineers who manage instruments and systems, administrative staff who oversee project work, and granting agency program officers who determine funding levels and manage solic- itations for projects. Obtaining and managing computational resources has become a key component of a number of disciplines’ research ac- tivities, and the need for computationally-skilled scientists seems to be 1 more urgent than ever. For some, computational methods represent

SHIFTING MODALITIES of USE in the XSEDE PROJECT Richard Knepper Submitted to the Faculty of the University Graduate School in Pa

Keynote Speaker-2 How I Became a Computational Scientist

Computational Scientist Tracking Code 3624 Job Description

Algorithmic Social Sciences Research Unit ASSRU

AETA-Computational-Engineering.Pdf

CAFCW20 Presenters

Undergraduate Computational Science and Engineering Education

Conservation of Computational Scientific Execution Environments

Curriculum Vitae Sharon Hammes-Schiffer Date of Birth

What Can I Do with a Major in Physics

1 the Attendee Registration Form Is Located At

A Science-Based Case for Large-Scale Simulation

Television Over the Internet