Research Computing and Cyberinfrastructure ‐ The Susta ina bility MMdlodel at Penn State

Vijay K. Agarwala Senior Director, Research Computing and Cyberinfrastructure The Pennsylvania State University University Park, PA 16802 USA [email protected]

May 3rd –5th, 2010 at Cornell University Center for Advanced Computing

1 Organizational Structure of Research Computing and CI at Penn State

University President

Senior Vice President Senior Vice President for Provost and Executive Vice for Research and Finance and Business President Dean of Graduate School

Vice Provost for Information Associate Vice President Technology and Chief Information Office of Physical Plant Officer

RCC Advisory Committee Director of Research Director of Director Up to seven faculty members Computing and Telecommunication Associate VP for Institute of Cyber and research administrators Cyberinfrastructure and Networking Research Science from the University Research Executive Director Services Council RCC Executive Committee

High Performance Domain Specific and Development and Computing Systems Consulting Support Telecollaborative Systems Programming Support

4 staff members 4 staff members 4 staff members 4 stfftaff members

2 Research Computing and Cyberinfrastructure

• Prov ide sstemssystems services by researching crrentcurrent practices in operating sstemsystem, file sstemsystem, data storage, job scheduling as well as computational support related to compilers, parallel computations, libraries, and other software support. Also supports visualization of large datasets by innovative means to gain better insight from the results of simulations. • Enable large‐scale computations and data management by building and operating several state‐ of‐the art computational clusters and machines with a variety of architectures. • Consolidate and thus significantly increase the research computing resources available to each faculty participant. Faculty members can frequently exceed their share of the machine to meet peak computing needs. • Provide support and expertise for using programming languages, libraries, and specialized data and software for several disciplines. • Investigate emerging visual computing technologies and implement leading‐edge solutions in a cost‐effective manner to help faculty better integrate data visualization tools and immersive facilities in their research and instruction. • Investigate emerging architectures for numerically‐intensive computations and work with early‐ stage companies. For example: interconnects, networking, and graphics processors for computations. • Help build inter‐ and intra‐institutional research communities using cyberinfrastructure techlihnologies. • Maintain close contacts with NSF and DoE funded national centres, and help faculty members with porting and scaling of codes across different systems. 3 Programs, Libraries, and Application Codes in Support of Computational Research

• Compilers and Debuggers: DDT, GNU Compilers, Intel Compiler Suite, NVIDIA CUDA, PGI Compiler Suite, TotalView • Computational Biology: BLAST, BLAST+, EEGLAB, FSL, MRIcro, MRICron, SPM5, SPM8, RepeatMasker, wuBlast • and Material Science: Accelyrs , Amber, CCP4, CHARMM , CPMD, Gamess, 03, GaussView, Gromacs, LAMMPS, NAMD, NWChem, Rosetta, Shrondinger Suite, TeraChem, ThermoCalc , VASP, WIEN2K, WxDragon • Finite Element Solvers: ABAQUS, LS‐DYNA, MD/Nastran and MD/Patran • Fluid Dynamics: Fluent, GAMBIT, OpenFOAM, Pointwise • Mathematical and Statistical Libraries and Applications: AMD ACML, ATLAS, BLAS, IMSL, LAPACK, GOTO, Intel MKL, Mathematica, MATLAB, Distributed MATLAB,NAG, PETSc, R, SAS, WSMP • Multiphysics: ANSYS, Comsol • Optimization: AMPL, CPLEX, GAMS, Matlab Optimization Toolbox, Matgams, OPL • Parallel Libraries: OpenMPI, Parallel IMSL, ScaLAPACK • Visualization Software: Avizo, Grace, IDL, Tecplot, VisIt, VMD All soft ware i nst all ati ons are d ri ven b y f acult y. Th e soft ware st ack on every system is customized and entirely shaped by faculty needs.

4 Compute Engines Core Compute Engines System Nodes Cores Memory (GB) Interconnect  A 48‐port 10 GigE switch connects all compute engines and storage Lion‐XO 64 192 1280 SDR Infiniband  1280 core system coming online in Lion‐XB 16 128 512 SDR Infiniband Summer 2010 Lion‐XC 140 560 1664 SDR Infiniband  All compute engines together to deliver 40 million core hours in 2010 Lion‐XJ 144 1152 5120 DDR Infiniband  Emerging technologies in the Lion‐XI 64 512 4096 DDR Infiniband hands of the user community – ScaleMP and GPU cluster Lion‐XK 64 512 4096 DDR Infbdfiniband Lion‐XH 48 384 2304 QDR Infiniband Lion‐CLSF 1 virtual node 128 768 QDR Infiniband using ScaleMP technology CyberStar 224 2048 7680 QDR Infiniband Hammer 8 64 1024 GigE TlTesla 2 servers 16 32 Gig E 1 NVIDIA S1070 960 16 GigE

Storage System Capacity Performance IBM GPFS Parallel File System 550 TB Can sustain 5+ GB/s 5 Job Scheduling Scheduling Tools • Torque is the resource manager • Moab is the job scheduler

Participating Faculty Partners • Each partner group gets its own queue in Torque • Each partner queue is assigned a fair‐share usage target in Moab equivalent to the group’s percentage of ownership in the specific compute engine • A 30 day fair‐share window is used to track partner group usage • When a partner group is below its target, the jobs in its queue are given a priority boost • When a partner group is at or above its target, the jobs in its queue have no priority boost but are also not penalized (they are treated on par with general users) • Within a partner group queue there is a smaller fair‐share priority between members of that group to keep anyone from monopolizing the group queue • Partner group queues do not limit the number of cores its members use • Partner group queues have by default a two week limit on the run time of jobs but this limit can be adjusted as necessary

General Users • General users have access to a public queue and a small fair‐share priority • General users are limited in the number of processor cores they can have in use (32) and the length of their jobs (24 hours)

6 8 12 groups exceeded their “target share” at some point during the period Only four groups remained below the “target share” for this period group1 7 group2 group3 6 group4 group5 Usage 5 group6 group7 rget aa T group8 to

4 group9 group10 Actual ff 3 o group11

group12

Ratio group13 2 group14 group15 1 group16 max usage

0 Usage of LION‐XJ compute cluster between January 1, 2010 ‐‐ April 22, 2010

7 Visualization Services

Staff members provide consulting, teach seminars, assist faculty and support facilities for visualization and VR.

Recent support areas include: • Visualization and VR system design and deployment • 3D Modeling applications: FormZ, Maya •Data Visualization applications: OpenDX, VTK, VisIt •3D VR development and device libraries: VRPN, VMRL, JAVA3D, OpenGL, OpenSG, CaveLib •Domain specific visualization tools: Avizo (Earth, Fire, Green, Wind), VMD, SCIRun • Telecollaborative tools and facilities: Access Grid, inSORS, VNC • Parallel graphics for scalable visualization: Paraview, DCV, VisIt • Programming support for graphics (e.g. C/C++, JAVA3D, Tcl/Tk, Qt)

8 Visualization Facilities

Our goal is to foster more effective use of visualization and VR techniques in research and teaching across colleges and disciplines via strategic deployment of facilities and related support.

• Locating facilities strategically across campus for convenient access by targeted disciplines and user communities • Leveraging existing applications and workflows so that visualization and VR can be natural extensions to existing work habits for the users being served • Providing outreach seminars, end‐user training and ongoing staff support in use of the facilities • Working on an ongoing basis with academic partners to develop and adapt these resources more ppyrecisely to fit their needs • Helping to identify and pursue funding opportunities for further enhancement of these efforts as they take root

9 Visualization Facilities

• Immersive Environments Lab: in partnership with School of Architecture and Landscape Architecture • Immersive Construction Lab: in partnership with Architecture Engineering • Visualization/VR Lab: in partnership with Materials Simulation Center • Visualization/VR Lab: in partnership with Computer Science and Engineering • Sports Medicine VR Lab: a partnership between Kinesiology, Athletics and Hershey Medical Center

10 Building a strong University based Computing Center (UBCC) Top 10 practices • Appeal to a broad constituency: science, engineering, medicine, business, humanities, liberal arts. Provision 25% of compute cycles with minimal requirements of proposal and review. • Flexibility in system configuration and adaptability to faculty needs: work with partners on their queue requirements, accommodate temporary priority boost. • Keep barriers to faculty participation low: as low as a $5000 investment for a contributing partner with priority access. Try‐before‐you‐buy program and guest status for prospective faculty partners. • Maximize system utilization: consistently above 90%, even contributing partners don't have instantaneous access. • Extensive software stack, and rapid turnaround in installation of new software: rich set of tools, compilers, libraries, community codes and ISV codes. • Provide consulting with subject matter expertise: differentiate the UBCC from “flop shops” and cloud providers. • Strong commitment to training: teaching in classes, seminars, workshops. • Provide accurate and daily system utilization data. • Build strong partnerships with hardware and software vendors. • Make emerging technologies and test beds available to faculty and students.

11 Cloud Computing –How will it impact Research CI at Universities?

Cloud computing : Virtualized/remote computing resource from which users can purchase what they need, when they need it

A clearinghouse for buyers and sellers of Digital Analysis Computing (DAC) services

Computing resources for commercial and academic community

Affiliated with Offers HPC services to industry, government, and academia

12 Cloud Computing –How will it impact Research CI at Universities?

•Google+IBM cloud - a ppplatform to computer science researchers for buildin g cloud applications. Combination of Google machines, IBM blade center and system x servers • NSF awarded $5.5 million to fourteen universities through its Cluster Exploratory (CluE) program

• A global open source test bed, Open Cirrus for the advancement of cloud computing research • University of Illinois, KIT (Germany), iDA (Singapore), ETRI (Korea), MIMOS (Malaysia), Russian Academy of Sciences

• Computing in the Cloud (CiC) – NSF Solicitation 10-550 (~ $5 mil in FY 2010)

13 Cloud Computing –How will it impact Research CI at Universities?

• What impact buying of commercial cloud services will have on future development of campus, regional and national CI? • How tightly should the campus CI be integrated with that of private companies in a geographic region? Would it promote stronger partnership between academia and industry? • Would new policies and guidelines emerge from federal funding agencies on incorporating cloud services in the grants? Would cloud service providers have to be US‐based for data and security reasons? Demand for cloud computing services by campus based researchers will depend on how research computation related CI is funded and deployed in the future, and just as importantly, how well the university based computing centers (UBCC) respond to it and stay competitive.

14 A model for University‐based Computing Center (UBCC) Assumes capital expenditures have been incurred for a data center as well as hardware and software. For example, a $10 million investment today in a green data center with a PUE of 1.25 and lower levels of redundancy can create a facility with: 3,200 sq. ft. of raised floor, 3,000 sq. ft. of office space, 1.5 ‐ 2.0 megawatts of electrical power, limited capacity UPS, and HVAC systems, all housed in a 12,000 sq. ft. building. A similar $10 million investment in hardware today can build aggregate peak computing capacity between 250 and 400 teraflops (25,000 – 40,000 cores) and 10 ‐ 20 petabytes of storage.

Annual Oppgerating Expenditure Hardware: $2.50 million replacing 25% of installed compute capacity each year Software: $0.25 million annual licensing costs Utility cost: $1.00 million (80 racks, avg. 15 KW/rack) Staff (16 people):$2. 00 million in salary and benefits Other expenses: $0.25 million Total: $6.00 million annual investment in support of research, teaching & outreach.

Once initial capital expenditures have been made, it is possible to deliver compute cycles, with a high level of system and computational staff support, at a cost between $0.025 to $0.04 per core hour (2.4 ‐ 3.0 GHz cores).

Universities are more likely to commit such institutional funds if Federal funding agencies (NSF, NIH, DoE, NASA, DoC) provide incentives by way of targeted support for UBCC.

15 The case for UBCC: why Federal agencies need to provide direct support

• Incentive for institutional investment: research universities will leverage targeted federal funding and commit more funds in supporting computational and data-driven research and teaching. • Efficient use of resources: UBCC, with its campus-level computing cloud, has proven to be highly optimal and cost-effective in meeting a large portion of computing needs across a range of disciplines. Breakthrough science is becoming possible on small-to-medium sized systems. The research community at campuses frequently relies on campus-based HPC systems and staff to speed up their development and discovery cycle. The campus based HPC systems are designed and their software stack frequently fine tuned in response to the computational needs of local research community and with the sole goal of increasing computational productivity of faculty and students. • Seed computational science education: UBCC is a critical enabler by providing hands-on advanced training to students and faculty. UBCC nurtures sustained interest in research computing and HPC technologies amongst faculty, graduate students, and more significantly, reaches the undergraduate student community. • Workforce Development: There is a widening skills gap i.e. shortage of people with disciplinary knowledge, skills in scalable algorithm and code development for large systems, and the ability to think across disciplinary boundaries and integrate modeling and computational techniques from different areas. There is an equally pronounced shortage of skilled people in system administration, in building and deploying various parts of the software stack, code scaling, and in adoption of large-scale computing in disciplinary areas. Support for UBCC will go a long way in addressing the shortage of skilled personnel and help with wider and deeper adoption of HPC technologies in both academia and industry. UBCC have a critical role to play in developing the next generation of HPC professionals and also in moderating the outsourcing of engineering services when it is due to shortage of skilled personnel in the US. • Industrial outreach: UBCC is uniquely positioned and highly effective in forging partnerships between industry and academia to make regional businesses globally competitive. It can leverage its CI assets, couple it with what industry may have, and in effect build "local collaboratories" around which faculty , students , and industrial R&D personnel can collaborate. In doing so, UBCC will support development of advanced modeling technologies, help overcome barriers of "high-perceived cost" that are preventing its widespread adoption by small and medium-sized businesses, and drive the adoption of modeling and simulation as an integral part of their research and product development cycles. UBCC can provide large-scale computing services and the continuity of contact between facultyyy/students and industry personnel. • Build a healthier HPC ecosystem: If more UBCCs are funded, it will increase diversity and number of participants, and the rate of innovation and number of companies in this important industry sector where US companies are global leaders.

16