SUSE® High Performance Computing Roadmap and Update
Kai Dupke Meike Chabowski Senior Product Manager Senior Product Marketing Manager SUSE Linux Enterprise SUSE Linux Enterprise [email protected] [email protected] History Future
2 SUSE – Strong in HPC Market!
SUSE® HPC
MULTI- and MANY-CORE PROCESSOR SUPPORT Intel, AMD, POWER ….. COOPERATION TECHNOLOGY Kernel 3.x SUSE IBM NEC Lustre File System SGI Cray Ceph storage platform Since 1992 HP Cisco Highly scalable - up to Strong Dell 4096 cores ….. Presence in Top500 ACADEMIC HIGH PRODUCTIVITY AND RESEARCH COMPUTING LRZ / SuperMUC Total BSC / MareNostrum Baker Hughes Tokyo Institute of Technology Texas Instruments Beijing Computing Center ….. NASA …..
3 Overview HPC Overview
SUSE® High Performance Computing
• Solving computational, data-intensive, or numerically-intensive tasks • Reducing the time and effort required to set-up and maintain HPC clusters • Ensuring that all components of the HPC stack work together
5 HPC Development
SUSE® High Performance Computing
• Yesterday
‒ Academia and Research • Today
‒ Academia and Research ‒ Financial Services ‒ Oil and Gas ‒ Semiconductor ‒ Life Sciences ‒ Manufacturing • Tomorrow
‒ Departmental and workgroup clusters ‒ High Productivity Computing
6 High Productivity Computing Hollywood and HPC
7 High Productivity Computing Big Data – or HPC??
8 HPC Market Linux Preferred for HPC
SUSE® High Performance Computing
• Linux
‒ runs on more than 90% of the world's top 500 supercomputers*
‒ is used by nearly 90% of general clusters
‒ Linux is used in the majority of HPC systems, from smaller departmental implementations to larger, integrated cluster solutions
*top500.org June 2013
10 Split Market
SUSE® HPC
Commercial Scientific High Productivity Top 500-class Computing
• Lighthouse projects • Highly specialized application • Government sponsored • ROI and reliability • Generic workloads are key • Often self-supported by • Data Center support Academic staff • Commodity hardware • Specialized hardware
11 Market Segmentation SUSE High Performance Computing
System HPC class Ready Key drivers GTM Budget Super Computer >500K$ +++ Special build HW HPC-IHV 'top 500' only performance count Self-supported Partner supported Divisional <500K$ +++ Customized HW IHV Partner driven ISV Partner supported SI SUSE supported Departmental <250K$ ++ Commodity HW Channel Business driven SUSE SUSE supported
Work Group <100K$ + Customer driven Channel Home brewed Shop
12 SUSE Linux Enterprise HPC Why Linux?
SUSE® High Performance Computing
• Open Source benefits
‒ Easy to customize, maintain and improve • Innovation
‒ Beowulf Clusters “born” on Linux • Modularity
‒ GUI overhead not required
‒ appliance form factors • Linux Standards
‒ Large base of tools, including remote management
‒ Hardware availability
‒ Large vendor ecosystem surrounding Linux HPC clusters
14 Why SUSE® Linux Enterprise Server For High Performance Computing
• Early player in HPC, pushing innovation and new technologies • Highly reliable, interoperable and manageable server operating system • Built to power mission-critical workloads in physical, virtual and cloud environments • The natural successor to UNIX, backed by proven services for UNIX migration • Special features to improve performance • Backed by established ecosystem – support and certificates • The only Linux recommended by Microsoft
15 SUSE Additional Features
SUSE® High Performance Computing
• Up-to-date 3.x Linux Kernel for optimal performance • CPU Management and System Activity
‒ CPUset System, CPUset command line tool
‒ Sysstat package
‒ IRQbalance • OpenFabrics Enterprise Distribution (OFED)
‒ Remote Direct Memory Access (RDMA) switched fabric technologies, high-speed data transport technologies for server and storage connectivity • SystemTap, LTTng 2.0 • Packaged Lustre
16 SUSE Advanced I/O Processing
SUSE® High Performance Computing
• Asynchronous I/O (AIO)
‒ Input/output processing that permits other processing to continue before the transmission has finished • Modular I/O Scheduler
‒ Algorithm most suitable for workload can be chosen dynamically • Multi-core/hyper-threading processor support
‒ Execute threads in parallel within each individual processor
‒ Supports up to 8192 cores per system • Intel I/O Acceleration
‒ Offloads the CPU towards the network card, thus allowing the system to continue processing data while I/O is taking place
17 Recent Enhancements
SUSE® High Performance Computing
• Storage • Kernel
‒ Support for btrfs ‒ Newest processors and chipsets ‒ Improved support for iSCSI and FCoE ‒ Better idle-load balancing
‒ Major filesystem ‒ Transparent huge pages performance increases ‒ Improved scaling of incoming • Management network traffic
‒ Faster, more powerful ‒ Up to 8192 cores control groups for resource • Network isolation ‒ Higher network throughput ‒ Improved power management ‒ Added tunables in the IP stack (for lower latency)
18 Customers, Partners Customers and Partners
SUSE® High Performance Computing
Customers
Partners
20 Fionn – SUSE benefits SUSE High Performance Computing
• Irish Center for High-End Computing “The stability is impressive“
• Power efficiency “SUSE Linux Enterprise Server doesn’t get in the ‒ 1st for x86 in top500 (June 2014) way of the computational workload” • Winning partnership “... great tools for set up and ‒ SGI, SUSE, Intel working together configuration, but gives us the flexibility to use other • tools, which simplifies 3 use cases maintenance.“ ‒ Thin: latest Intel Ivy Bridge „In our view, ... very well suited to high-performance ‒ Fat: large shared-memory computing.”
‒ Hybrid: Xeon Phi & NVIDIA Tesla — Niall Wilson Infrastructure Manager ICHEC
21 Intel Cluster Ready Program
SUSE® High Performance Computing
• Designed to simplify purchasing, deployment and management of HPC clusters • SUSE Linux Enterprise Server is Intel Cluster Ready and powers many certified Intel Cluster Ready systems • intel® Cluster Ready “recipes” are available with SUSE Linux Enterprise Server
‒ Reference designs to help hardware vendors, platform integrators, and system integrators design and build certified Intel Cluster Ready systems
22 Business Update Simplify Projects!
SUSE® HPC
• Simplified model
‒ Only number of socket pairs matter
‒ Socket pairs are accumulated per cluster
‒ Head nodes and compute nodes are threaten equal
24 Keep It Running!
SUSE® HPC
• SUSE Vendor Support
‒ All levels of support for the whole system
‒ Maintenance, Standard, Priority • Intel Enterprise Lustre Support
‒ Get Lustre support from Intel/Whamcloud
25 Technical Update SUSE Linux Enterprise Server 12
SUSE® HPC
• Major 4 virtualization technologies
‒ XEN, KVM, LXC, Docker¹ • Workload management with Systemd
‒ Prioritization with CGroups • Tracing tools for software optimization
‒ LTTng with graphical frontend
27 ¹Docker provided as technical preview – see release notes LTTng viewer
SUSE® HPC
28 SUSE Linux Enterprise Server 12
SUSE® HPC
• Machinery module
‒ KIWI image creation
‒ cfengine, puppet
‒ System verification & analysis • Updated Stack
‒ Kernel, Tools, pNFS, OFED, openmpi, chipset support
29 Machinery
SUSE® HPC
30 www.suse.com/products/server/hpc.html
31 Backup SuperMUC SuperMUC – SUSE benefits SUSE High Performance Computing
• Great support experience “We have relied on SUSE Linux Enterprise Server for ‒ Cooperation for more than 15 years 15 years, and have always been very satisfied. ‒ Backed by SUSE's winning support The SUSE team is close at • Support for Itanium2 and x86 hand, should we require support or guidance. ‒ Smooth migration of old to new system We have received highly ‒ No additional staff training needed competent support over the years, and look forward to • collaborating with them. Easy deployment methods — Dr. Herbert Huber ‒ SUSE's autoYaST used today Division Head of Supercomputing ‒ Other SUSE offerings – SUSE Cloud, SUSE Leibniz Rechenzentrum Manager – considered
34 LRZ - Leibniz Rechenzentrum Europe’s supercomputer run SUSE Linux Enterprise Server
Business challenge: LRZ is part of the Gauss Centre for Supercomputing (GCS), which “We have relied on SUSE operates the most powerful HPC infrastructure in Europe, and needs to Linux Enterprise Server for 15 provide researchers across Europe with a reliable and powerful HPC platform, which enables users to make faster progress in their complex years, and have always been research projects. To reduce the environmental impact of HPC, the institution very satisfied. aimed at improving the energy efficiency leverage established automation solutions to maximise the efficiency and manageability of the new The SUSE team is close at supercomputing platform. hand, should we require support or guidance. Solution: Working with SUSE and IBM, LRZ implemented SuperMUC with approx. 9,400 general purpose computing nodes, a peak performance of three We have received highly Petaflop/s, comprised of 155,000 Intel Xeon processor cores and more competent support over the than 300 TB main memory. LRZ chose to run SuperMUC on SUSE Linux years, and look forward to Enterprise Server, leveraging SUSE’s proven HPC expertise and leading collaborating with them. automation tools such as AutoYaST, which allows systems to be installed without manual intervention. — Dr. Herbert Huber Division Head of Supercomputing Benefits: Leibniz Rechenzentrum • Completed easy and smooth migration from previous Itanium 2 infrastructure to new x86 processor architecture • Considerably simplified configuration and automation of the new system, using the automation capabilities of AutoYaST(integrated with SLES) • Improved the energy efficiency: SuperMUC delivers appro. 20 times more performance per watt than its predecessor • Boosted overall performance by a factor of 60
35 SuperMUC – Facts
SUSE® High Performance Computing
• 60x faster, one of the fastest HPC systems in Europe • 20x better performance per Watt, provide green HPC • > 155,000 Intel Xeon Processor, migration from Itanium2 to x86
36 SuperMUC – System Overview
37 SuperMUC – Business Aspects SUSE High Performance Computing
• Hot Water Cooling – reduce cooling cost
‒ Use free air cooling
‒ Use of system heat for heating and technical processes • RAS driven – high system availability
‒ Full maintained SUSE Linux Enterprise Server
‒ Full support via IBM and SUSE • Automated deployment – less management cost
‒ Full use of SUSE's autoYAST feature
38 HPC Stack Challenge
SUSE® High Performance Computing
HPC market still developing
Stack components provided by various vendors
Some stack components run in parallel
Mix of small and big vendors
Segmented into commercial and scientific
40 HPC Stack
SUSE® High Performance Computing
Application
Queuing / Management Software & Tools PBS Pro Moab IBM LSF Bright CM
Message Passing Interface Storage Network MPI EXT3 XFS BTRFS 10G OFED Parastation Intel HP SGI OCFS2 NFS pNFS TCP offload MPICH openMPI IBRIX GPFS pNFS Lustre cephFS
SUSE Linux Enterprise Server
Hardware
= SUSE supported = SUSE Partner = SUSE future
41 SuperMUC – SUSE benefits SUSE High Performance Computing
• Support for Itanium2 and x86 “We have relied on SUSE Linux Enterprise Server for ‒ Smooth migration of old to new system 15 years, and have always been very satisfied. ‒ No additional staff training needed The SUSE team is close at • Great support experience hand, should we require support or guidance. ‒ Cooperation for more than 15 years We have received highly ‒ Backed by SUSE's winning support competent support over the years, and look forward to • collaborating with them. Easy deployment methods — Dr. Herbert Huber ‒ SUSE's autoYAST used today Division Head of Supercomputing ‒ Other SUSE offerings – SUSE Cloud, SUSE Leibniz Rechenzentrum Manager – considered
42 Storage Options File Systems – Today SUSE High Performance Computing
• Local storage
‒ Maintain existing capabilities (e.g. EXT3, XFS)
‒ Full btrfs support, improving manageability
‒ Maximum flexibility for customers • Expand network filesystem capabilities (NFSv4.x/pNFS)
‒ Improve performance, reliability and security
‒ pNFS client support, server support for later version of SUSE Linux Enterprise
44 File Systems – BTRFS SUSE High Performance Computing
• Integrated Volume Management • Support for copy on write • Powerful snapshot capabilities • Scalability • Other Capabilities:
‒ Compression
‒ Data integrity (checksums)
‒ SSD optimization • Status:
‒ SLE 11: Fully supported
‒ SLE 12:Planned as default file system
45 File Systems – CEPH SUSE High Performance Computing
• Ceph is a scalable open source storage platform comprised of an object store (Rados), block store (RDB),a POSIX-compatible distributed file system (Ceph FS), and an Amazon S3 integration
• Ceph has been integrated with OpenStack and is included in the Linux kernel
46 Distributed Storage System Market
• IDC predicts, that by 2015, combined spending for public and private cloud storage will be $22.6 billion worldwide • Gartner predicts, that by 2016
‒ more than one third of consumer data will be stored in cloud storage
‒ storage will grow from 329 exabytes in 2011 to 4.1 zettabytes (12x)
47 File Systems – Cluster SUSE High Performance Computing
• OCFS2
‒ Superior cluster file system for up to 32 nodes
‒ Scalable network access via CTDB
‒ Used for big storage and user directories • GPFS
‒ 3rd party offering by IBM • IBRIX
‒ 3rd party offering by HP
48 File Systems – Lustre SUSE High Performance Computing
• Maintenance Release 2.1
‒ Available for SUSE Linux Enterprise Server 11 SP1+ • Maintenance Release 2.4
‒ Available for SUSE Linux Enterprise Server 11 SP2+
• Client already accepted by Whamcloud (http://downloads.whamcloud.com)
• SUSE sponsored and developed port provided to the community
http://drivers.suse.com/lustre/
49 www.suse.com/products/server/hpc.html
50 Learn More
www.suse.com/products/server/hpc.html
Thank you.
51 Corporate Headquarters +49 911 740 53 0 (Worldwide) Join us on: Maxfeldstrasse 5 www.suse.com www.opensuse.org 90409 Nuremberg Germany
52 Unpublished Work of SUSE. All Rights Reserved. This work is an unpublished work and contains confidential, proprietary and trade secret information of SUSE. Access to this work is restricted to SUSE employees who have a need to know to perform tasks within the scope of their assignments. No part of this work may be practiced, performed, copied, distributed, revised, modified, translated, abridged, condensed, expanded, collected, or adapted without the prior written consent of SUSE. Any use or exploitation of this work without authorization could subject the perpetrator to criminal and civil liability.
General Disclaimer This document is not to be construed as a promise by any participating company to develop, deliver, or market a product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. SUSE makes no representations or warranties with respect to the contents of this document, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The development, release, and timing of features or functionality described for SUSE products remains at the sole discretion of SUSE. Further, SUSE reserves the right to revise this document and to make changes to its content, at any time, without obligation to notify any person or entity of such revisions or changes. All SUSE marks referenced in this presentation are trademarks or registered trademarks of Novell, Inc. in the United States and other countries. All third-party trademarks are the property of their respective owners.