Self-Scaling Clusters and Reproducible Containers to Enable Scientific Computing Peter Z. Vaillancourt∗, J. Eric Coultery, Richard Knepperz, Brandon Barkerx ∗Center for Advanced Computing Cornell University, Ithaca, New York, USA Email:
[email protected] yCyberinfrastructure Integration Research Center Indiana University, Bloomington, IN, USA Email:
[email protected] zCenter for Advanced Computing Cornell University, Ithaca, New York, USA Email:
[email protected] xCenter for Advanced Computing Cornell University, Ithaca, New York, USA Email:
[email protected] Abstract—Container technologies such as Docker have become cyberinfrastructure at a number of campuses, enabling them a crucial component of many software industry practices espe- to provide computational resources to their faculty members cially those pertaining to reproducibility and portability. The con- [2]. As more organizations utilize cloud resources in order tainerization philosophy has influenced the scientific computing community, which has begun to adopt – and even develop – con- to provide flexible infrastructure for research purposes, CRI tainer technologies (such as Singularity). Leveraging containers has begun providing tools to harness the scalability and utility for scientific software often poses challenges distinct from those of cloud infrastructure, working together with the members encountered in industry, and requires different methodologies. of the Aristotle Cloud Federation Project [3]. Red Cloud, the This is especially true for HPC. With an increasing number Cornell component of the Aristotle Cloud Federation, provides of options for HPC in the cloud (including NSF-funded cloud projects), there is strong motivation to seek solutions that provide an OpenStack cloud services much like the Jetstream resource, flexibility to develop and deploy scientific software on a variety that is available both to Cornell faculty as well as to Aristotle of computational infrastructures in a portable and reproducible members.