Science on the Teragrid
Total Page:16
File Type:pdf, Size:1020Kb
COMPUTATIONAL METHODS IN SCIENCE AND TECHNOLOGY Special Issue 2010, 81-97 Science on the TeraGrid Daniel S. Katz1,2,3, Scott Callaghan4, Robert Harkness5, Shantenu Jha2,6,7, Krzysztof Kurowski8, Steven Manos9*, Sudhakar Pamidighantam10, Marlon Pierce11, Beth Plale11,12, Carol Song13, John Towns10 1Computation Institute, University of Chicago and Argonne National Laboratory, USA 2Center for Computation & Technology, Louisiana State University, USA 3Department of Electrical and Computer Engineering, Louisiana State University, USA 4University of Southern California, USA 5San Diego Supercomputer Center, University of California San Diego, USA 6Department of Computer Science, Louisiana State University, USA 7e-Science Institute, University of Edinburgh, UK 8Poznan Supercomputing and Networking Center, Poland 9Information Technology Services, University of Melbourne, Australia 10National Center for Supercomputer Applications, University of Illinois, USA, 11Pervasive Technology Institute, Indiana University Bloomington, USA 12School of Informatics and Computing, Indiana University Bloomington, USA 13 Rosen Center for Advanced Computing, Purdue University, USA (Received: 21 May 2010; published online: revised: 29 October 2010; 23 November 2010) Abstract: The TeraGrid is an advanced, integrated, nationally-distributed, open, user-driven, US cyberinfrastructure that enables and supports leading edge scientific discovery and promotes science and technology education. It comprises supercomputing resources, storage systems, visualization resources, data collections, software, and science gateways, integrated by software systems and high bandwidth networks, coordinated through common policies and operations, and supported by technology experts. This paper discusses the TeraGrid itself, examples of the science that is occurring on the TeraGrid today, and applications that are being developed to perform science in the future. Key words: computational science applications, high performance computing, grid computing, production grid infrastructure I. INTRODUCTION software systems and high bandwidth networks, coordi- nated through common policies and operations, and sup- TeraGrid is an an advanced, integrated, nationally-dis- ported by technology experts. The TeraGrid has been tributed, open, userdriven, US computational science cyber- supported by funding from the National Science Founda- infrastructure that enables and supports leading edge scien- tion's (NSF) Office of Cyberinfrastructure (OCI) and sig- tific discovery and promotes science and technology nificant matching funds from the participating institutions education, operated in a partnership comprising the Grid since it began as the Distributed Terascale Facility in 2001. Infrastructure Group (GIG) and eleven Resource Provider The TeraGrid began in 2001 when the U.S. National (RP) institutions. The TeraGrid includes supercomputing Science Foundation (NSF) made an award to four centers resources, storage systems, visualization resources, data to establish a Distributed Terascale Facility (DTF). The collections, soft-ware, and science gateways, integrated by DTF became known to users as the TeraGrid, a multi-year *Steven Manos was a Research Fellow at University College London while this work was conducted 82 D.S. Katz, S. Callaghan, R. Harkness et al. effort to build and deploy the world’s largest, fastest, most through a three-pronged strategy, known as deep, wide, and comprehensive, distributed infrastructure for general open: scientific research. The initial TeraGrid was homogeneous • Deep: ensure profound impact for the most experi- and very “griddy”, with users foreseen to be running on enced users, through pro vision of the most powerful multiple systems, both because their codes could run computational resources and advanced computa- “anywhere”, and because in some cases, multiple systems tional expertise. would be needed to support the large runs that were The TeraGrid’s deep goal is to enable transforma- desired. The software that made up the TeraGrid was a set tional scientifoc discovery through leadership in the of identical packages on all systems. The TeraGrid has use of high-performance computing (HPC) for high- since expanded in capability and number of resource end computational research. The TeraGrid is de- providers. This introduced heterogeneity and thus added signed to enable high-end science utilizing powerful complexity to the grid ideals of the initial DTF, as the supercomputing systems and high-end resources for common software no longer could be completely identical. the data analysis, visualization, management, stor- This led to the concept of common interfaces, with age, and transfer capabilities required by large-scale potentially different software underneath the interfaces. simulation and analysis. All of this requires an Additionally, the users of national center supercomputers increasingly diverse set of leadership-class resources were merged into TeraGrid, which led to TeraGrid and services, and deep intellectual expertise in the increasing its focus on supporting these users and their application of advanced computing technologies. traditional parallel/batch usage modes. • Wide: enable scientific discovery by broader and The TeraGrid is freely available for US researchers, more diverse communities of researchers and educa- with allocations determined by a peer-review mechanism. tors who can leverage TeraGrid’s high-end re- The TeraGrid resources are generally at least 50% sources, portals and science gateways. dedicated to TeraGrid. Roughly similar to the TeraGrid in The TeraGrid’s wide goal is to increase the overall terms of a continental-scale set of resources with a focus on impact of TeraGrid’s advanced computational re- HPC users is DEISA/PRACE in Europe, which is a set of sources to larger and more diverse research and national resources with some fractions provided to a set of education communities. This is done through user European users, again determined through a review interfaces and portals, domain specific gateways, and process. An alternative type of infrastructure is one that is enhanced support to facilitate scientific discovery by more focused on high throughput computing (HTC), people who are not high performance computing initially in high-energy physics, but increasingly in other experts. The complexity of using TeraGrid’s high- domains as well. Filling this role in the US is Open Science end resources will continue to grow as systems Grid (OSG), and in Europe, Enabling Grids for E-sciencE increase in scale and evolve with new technologies. (EGEE), which has recently been transformed into the TeraGrid broadens its user base by providing European Grid Infrastructure (EGI). simpler, but powerful, interfaces to resources, such This paper discusses the TeraGrid itself (Section II), as establishing common user environments and examples of the interesting current uses of the TeraGrid developing and hosting Science Gateways and cyberinfrastructure for science (Section III), and examples portals. TeraGrid also provides focused outreach and of applications that are being developed to perform science collaboration with science domain research groups, on the TeraGrid in the future (Section IV)1. and con- ducts educational and outreach activities that help inspire and educate the next generation of America's leading-edge scientists. II. THE TeraGrid • Open: facilitate simple integration with the broader cyberinfrastructure through the use of open inter- TeraGrids mission has the potential to advance the faces, partnerships with other grids, and collabora- nations scientific discovery, ensure global research leader- tions with other science research groups that deliver ship, and address important societal issues. This is done and support open cyberinfrastructure facilities. TeraGrid's open goal is twofold: to enable the exten- 1 sibility and evolution of the TeraGrid by using open The examples in this paper were chosen by the authors from the much larger set of science being performed and planned on TeraGrid for having standards and interfaces; and to ensure that the particularly interesting interactions with cyberinfrastructure. It's likely TeraGrid is interoperable with other open, standards- that another set of authors writing a similar paper would choose other examples based cyberinfrastructure facilities. While TeraGrid Science on the TeraGrid 83 only provides (integrated) high-end resources, it TeraGrid management and planning is coordinated via must enable its high-end cyberinfrastructure to be a series of regular meetings, including weekly TeraGrid more accessible from, and even federated or inte- AD teleconferences, biweekly TG Forum teleconferences, grated with, cyberinfrastructure of all scales. That biweekly project-wide Round Table meetings (held via includes not just other grids, but also campus Access Grid), and quarterly face-to-face internal project cyberinfrastructures and even individual researcher's meetings. Management of GIG activities is led by the GIG labs or systems. The TeraGrid leads the community Project Director and accomplished primarily by the GIG forward by providing an open infrastructure that Area Directors. Coordination of project staff in terms of enables, simplifies, and encourages scaling out to its detailed technical analysis and planning, particularly across leadership- class resources by establishing models in RP and GIG efforts, is done through two types of technical which computational resources