IDC TECHNOLOGY SPOTLIGHT

An Approach for Designing HPC Systems with Better Balance and Performance April 2016 By Steve Conway; Earl C. Joseph, Ph.D.; and Bob Sorensen Sponsored by Corporation

The market for high-performance computing (HPC) systems has been one of the fastest-growing IT markets. The global HPC systems market more than doubled from 2001 to 2014 — growing from $4.8 billion to $10.2 billion — and IDC predicts that it will reach $15.2 billion in 2019. (Add software, storage, and services and the 2019 forecast expands to $31.3 billion.) An almost insatiable appetite for higher application performance has fueled spending growth among existing users while attracting thousands of new adopters, including small and medium-sized businesses (SMBs) and large commercial firms with big data analytics needs that enterprise technology alone can't handle well. Demand for HPC is robust and growing, but to meet this demand fully, the HPC community needs to address a daunting set of requirements — application compatibility, larger system sizes (soon to include exascale), mixes of compute-intensive and data-intensive workloads, newer environments (cloud computing), improved energy efficiency and reliability/resiliency, better command and control of all this functionality via the software stack, deeper memory hierarchies, and alleviation of the data movement (I/O) bottleneck via better interconnect fabrics. In recent years, reference architectures — flexible master blueprints — have arisen to help developers create HPC systems that are integrated, performant, and responsive to these complex requirements. This IDC Technology Spotlight discusses this trend and uses the Intel Scalable System Framework (SSF) to illustrate progress to date and where things are headed. Sustained, Balanced Performance: The Holy Grail for HPC Users

It's no accident that HPC stands for high-performance computing. Since the start of the era in the 1960s, a fast-growing contingent of scientists, design engineers, and advanced data analysts have turned to HPC systems to run their problems at the highest available speed and resolution.

Especially during the past 15 years, the peak performances of HPC systems — their hypothetical speed limits — have skyrocketed. On the November 1999 list of the world's most powerful (www..org), the number 1 system boasted 9,632 cores and peak performance of 3.2 teraflops (TF). Fast-forward 16 years to the November 2015 list and the number 1 supercomputer featured 3.1 million cores and peak performance of 54.9 petaflops (PF). That's a 324-fold increase in the core count and a 17,156-fold leap in peak performance (see Table 1).

Sustained performance — actual speed on end-user applications — has also made impressive advances but with varied success. "Embarrassingly parallel" codes still manage to exploit substantial fractions (>20%) of even the largest supercomputers, but many applications do not fit that description.

US41171216 TABLE 1

Growth in System Size: Top Supercomputer on Top500 List

November 1999 November 2015 Gain Factor (X)

Peak performance (TF) 3.2 54,900 17,156

Core count 9,632 3,120,000 324

Peak performance/core (GF) 0.33 17.6 53

Source: IDC, 2015

IDC's 2015 worldwide HPC end-user study found that 51.8% of all codes were running on one node or less, and 10.9% of codes ran on just a single core. Only 9.1% of applications were being run on more than 1,000 cores, and just 2.5% were scaling to 10,000 or more cores.

Some of the responsibility for limited sustained performance rests with the application codes themselves. Extreme examples include codes that were written decades ago to run on single-processor, central memory vector supercomputers. Many of these codes have been modified but were never fundamentally rewritten to exploit today's highly parallel HPC systems efficiently. Boosting sustained performance for these HPC codes and many others depends on more than just exposing enough parallelism to exploit more processor cores on CPUs or accelerators/coprocessors. These codes also need strong I/O capabilities to keep the processors supplied with data, along with a constellation of other requirements (see the following section). Addressing System Imbalance and Growing Complexity

To move performance on end-user applications forward will require the HPC community to address two key challenges: the imbalanced architectures of today's HPC systems and the complex and growing set of requirements users have for these systems.

 Not just for exascale users. Although much of the discussion in the HPC community lately has focused on advancing capabilities to prepare for in the 2020–2024 era and beyond, improvements in system balance and in managing system complexity stand to benefit the entire HPC community, with systems from a few nodes up to the largest supercomputers.

Processors Are Not the Performance Problem

The Top500 figures noted in the preceding section demonstrate that processors are not the problem. System peak performance is usually just the peak performance of one processor times the number of processors in the system — and the figures show that the peak performance of processors has advanced strongly in the past 15 years. Today, the processors — whether CPUs or accelerators/coprocessors — are nearly always the fastest elements in any HPC system, by far. The system's sustained performance on a range of user applications typically depends much more on the ability of the rest of the system, especially the memory subsystem and interconnect, to keep pace with the processors.

2 ©2016 IDC The global HPC user community is well aware of this system-level imbalance problem (the "memory wall" and "I/O wall") and has urged vendors to alleviate this worsening bottleneck by improving the capabilities of the nonprocessor parts of the system. That's important, because system-level imbalance not only constrains the performance of user applications but also throttles organizational productivity and the value (return on investment [ROI]) obtained from increasingly substantial investments in HPC resources. The rapid growth in high-performance data analysis — big data needing strong memory and I/O capabilities for data-intensive simulation and analytics — is exacerbating the imbalance issue.

Advancing Interconnect Performance

An important strategy for improving system balance is to enable interconnect fabrics to move data with higher bandwidth and lower latency, especially to keep the processors reasonably busy. With that goal in mind, interconnect-related R&D has heated up in recent years — and not only at large interconnect vendors such as Mellanox and Intel. Extoll (Germany) and Numascale (Norway) already market their interconnect products, and Atos (Bull) is developing an interconnect fabric for its HPC systems. In addition, IDC expects to offer a special variant of the Intel Omni-Path interconnect, derived from the technical collaboration between these two companies.

Breaking Down the Memory Wall

Recognizing the significant impact an advanced memory architecture would have on HPC applications, Intel has been working for some time on technology innovation to reduce the lag time between processors and data.

While Intel has not released a wide range of details specific to its new memory technologies, two significant milestones have been disclosed that offer a tremendous performance improvement for HPC applications.

In July 2015, Intel and Micron unveiled 3D XPoint technology, a nonvolatile memory that will benefit both compute-intensive applications and data-intensive applications requiring fast access to large sets of data. This is the first new memory category since the introduction of NAND flash in 1989. Manufacturer-supplied proof points suggest the new 3D XPoint technology offers nonvolatile memory speeds up to 1,000 times faster than those of NAND, currently the most popular nonvolatile memory in the marketplace.

In August 2015, Intel announced its Intel Optane Technology to combine the 3D XPoint nonvolatile memory media with the company's advanced system memory controller, interface hardware, and software IP as the foundation for a range of future products, including a new line of Intel DIMMs designed for next-generation platforms.

Addressing Complexity via the Software Stack

Although imbalance (extreme compute centrism) is arguably the biggest performance problem facing HPC systems today, it is not the only major problem. Another important issue for buyers and users, as noted previously, is managing the complex, growing set of requirements for operating these systems. These mounting requirements have made it more challenging to present users with an HPC resource that is comprehensive, coherent, and highly performant on their applications.

The system management requirements fall into the following main categories:  Heterogeneous workloads (floating point–based simulation, integer-based analytics)  The movement from synchronous applications to asynchronous workflows  Rapid growth in average system sizes and component counts  Heterogeneous processing elements (CPUs, coprocessors/accelerators)

©2016 IDC 3  Heterogeneous environments (on-premise datacenters, public/hybrid clouds)  Reliability/resiliency at scale  Power efficiency and power awareness ("power steering")  Cybersecurity

The HPC community is also dealing with the emerging trend toward deeper, more heterogeneous memory hierarchies (e.g., solid state drives [SSDs], on-package memory, NVRAM, and burst buffers). The chief responsibility for managing this complexity falls to the HPC software stack (i.e., software between the hardware and the application layer). During the past decade, HPC stacks have had to become much more sophisticated and intelligent. But the evolving list of requirements presents new challenges for stack developers and end users. For example, as new versions of open source components of the stack are released and inserted into an HPC software stack, there is an ongoing need to integrate, test, and validate the aggregate system software. Today, this effort is duplicated by multiple HPC system builders, including OEMs, ISVs, academia, labs, and end users.

Rather than continuing to face these mounting software stack challenges alone, many HPC vendors are now partnering with the open source community. One such collaboration is OpenHPC (www.openhpc.community), a collaborative initiative in which Intel acts as an important catalyst. "Open ecosystem" collaborations of this kind have several potential benefits for HPC vendors and the HPC community as a whole:

 Making use of a pretested and integrated open source HPC software stack for baseline functions enables vendors to focus their own R&D budgets and efforts on software development targeted at providing competitive differentiation.

 These partnerships typically result in two versions of the vendor's software: an open version that's free to use and a paid version for users who need special added capabilities or vendor support. The open version seeds the community with potential new users of the paid version — the vendors assume that a portion of users who become accustomed to employing their free version (and who may even be students at the time) will want the paid version later on.

Reference Architectures: Flexible Blueprints for Building Integrated, Performant HPC Systems

Because of the growing complexity and interdependence of the elements of HPC systems, it is crucial to take a holistic, system-level approach to addressing system performance and management issues. Reference architectures embody this approach by providing blueprints that, if followed, help ensure that the varied elements of the resulting HPC systems will interoperate not only compatibly but also with strong performance. The best reference architectures are also flexible, allowing users to substitute their own preferred software components for the default components, as long as the new components comply with the conventions specified in the reference architecture.

HPC reference architectures became more important when the early, "do it yourself" cluster period gave way to the present era of vendor-produced clusters and other distributed computing systems. Even then, users increasingly realized that it was easier and more economical to let vendors integrate and factory-test the hardware-software systems. Relying on vendors was especially important for sites that were relatively new to HPC. Today, however, the challenges associated with producing coherent, performant HPC systems are far more daunting, and even the largest vendors rely on references architectures for building systems — including the world's biggest, baddest supercomputers for the most experienced user sites. Reference architectures help users deal with the escalating pace of new technology development. That in turns helps user organizations accelerate times to solution and product development cycles.

4 ©2016 IDC Intel Scalable System Framework

Over the past five or six years, Intel has made a concerted effort to grow and acquire significant HPC system-level expertise and talent. The company has been evolving from the stereotypical image of workers in bunny suits who manufacture computer chips to an ecosystem-enabling organization, particularly in HPC. Intel Scalable System Framework, or Intel SSF, represents the fruits of that labor — an HPC architectural direction for developing high-performance, balanced systems that address issues of application compatibility, power efficiency, and resiliency while supporting a wide range of both compute-intensive workloads and data-intensive workloads.

Why Is Intel Doing This?

As noted, in recent years Intel has been evolving from an HPC processor supplier to an HPC ecosystem supplier. This evolution is a natural outgrowth of Intel's processor business. The Intel processor has been massively successful in the worldwide HPC market — in 2014, 93.1% of HPC system revenue was for x86-based systems, and the vast majority of those processors came from Intel. This growth has brought Intel into close contact with a large percentage of the world's HPC sites, along with their requirements and issues. Intel realized some years ago that the major issues affecting HPC users were at the system level. The Intel Cluster Ready program, which provided an HPC reference architecture, is an early demonstration of Intel's transition to system-level thinking.

Intel SSF adopts a holistic, system-level perspective and aims to alleviate architectural imbalance and system management complexity — the HPC community's two major system-related issues — while maximizing application compatibility. It is important to keep in mind that Intel SSF is not a solo undertaking on Intel's part. On the contrary, it's a close collaboration between Intel, acting as catalyst, and a growing number of OEMs and other members of the global HPC community, including open source software developers.

Key Elements of the Intel Scalable System Framework

The main elements of Intel's collaborative SSF initiative include:

 Compute. Intel Xeon processors and Intel processors (code name Knights Landing) are expected to advance performance on Intel's processor cadence.

 Fabric: The Intel Omni-Path Architecture (OPA) interconnect fabric. OPA will benefit from Intel's acquisition of interconnect technologies from Cray, QLogic, and Fulcrum. By closely integrating Intel processors and coprocessors, along with the Omni-Path Architecture — and by tightening this integration in Omni-Path successor fabrics — Intel expects to boost the performance and scalability of a wide spectrum of HPC end-user applications. The integration aims primarily to exploit peak performance increases on the Intel Xeon and Xeon Phi multicore processor road maps, but IDC expects the Intel fabric to benefit performance on general-purpose graphics processing units (GPGPUs) as well, although overhead would still be incurred with offloading to the accelerator. The Intel Omni-Path Architecture and its compatible successor fabrics target increased performance on both simulation workloads and advanced analytics workloads.

 Software: The Intel HPC system software stack. Intel will develop multiple, modular Intel-supported HPC software stack products based on the stack currently available for free in the OpenHPC technical project under the Foundation. The Intel stack is designed to enable users to create solutions based on the Intel SSF reference architectures.

©2016 IDC 5  Memory and storage: Intel Solutions for Lustre software, Intel Optane Technology–based SSDs, and 3D XPoint Technology. Intel and its collaborators are preparing to scale Lustre higher than it's ever scaled before — high enough to efficiently exploit Aurora, the most powerful future supercomputer announced to date. Intel reports that Intel Optane Technology–based solid state storage devices are up to seven times faster than other contemporary SSDs. Further, 3D XPoint Technology combines memory and storage in one nonvolatile device designed to be less expensive than DRAM and faster than NAND.

 Implementation guidelines. Intel will provide Intel SSF reference architectures, designs, and validation tools. These technical system specifications will include hardware and software bill of materials for Intel SSF–validated systems.

Strong Starting Momentum for Intel Scalable System Framework

Although Intel's SSF initiative is still fairly new, the list of OEMs that plan to make use of the SSF architectural direction is already impressive. It includes Colfax, Cray, Dell, Fujitsu Ltd., Hewlett Packard Enterprise, Inspur, Lenovo, Penguin Computing, SGI, Sugon, and Supermicro.

The Department of Energy (DOE)–funded Aurora supercomputer, expected to be the world's most powerful supercomputer when installed at Argonne National Laboratory (ANL), will use Intel SSF.

Opportunities

The Intel SSF initiative creates important opportunities for the company:

 Enhance the value of Intel processors. Intel's plan for increasingly tightening CPU-fabric integration over time aims to boost the real-world performance of Intel Xeon and Xeon Phi processors. Assuming Intel carries out this plan successfully, this tighter integration, in conjunction with new efficiencies targeted for the Intel HPC software stack, should enhance the value (ROI) of Intel processors by enabling a higher percentage of their peak performance to be exploited on end-user applications.

 Add to Intel's system-level thinking and fortify Intel's position as a system-level contributor to the global HPC community. Again, processors today are not the problem where HPC performance is concerned. The key problems exist at the system level. Intel has gained considerable system-level knowledge in recent years. Adding to this base of knowledge and experience through SSF-related collaborations with OEMs, the open community, and other HPC constituencies should enable Intel to establish its position as a leading system-level thinker and extend its position as a significant contributor to the HPC community. The Aurora project, in which Intel will partner with Cray and others to develop what will likely be the world's most powerful supercomputer, illustrates how the Intel Scalable System Framework can carry the company into system-level leadership roles.

 Help pioneer a new model for HPC software stack development. The Intel HPC software stack initiative will be among the first to rely heavily on the OpenHPC community that was launched just prior to SC'15 (www.openhpc.community) to help address the mounting, complex requirements for the stack. Assuming this initiative meets its goals, it will help introduce a new, more efficient model for developing software stacks and encourage software and hardware innovation in parallel.

6 ©2016 IDC Challenges

 Maintain strong relationships with OEMs, ISVs, and other collaborating HPC vendors. As Intel begins to provide some of the integration at the system level, it will be important for the company to maintain strong relationships with a large group of collaborating HPC vendors, especially OEMs. The impressive contingent of HPC OEMs that have already agreed to make use of Intel's SSF indicates that Intel is committed to this important goal. The Intel Cluster Ready initiative attracted many ISV collaborators. Encouraging ISVs to migrate the Intel Cluster Ready catalogue of third-party software to Intel SSF is a challenge, but IDC believes that Intel's established relationships with ISVs will enable the company to meet this challenge.

 Commercialize open source software. IDC studies confirm that even the best open source software may require a major, multiyear effort to "harden" into a paid commercial product. Intel seems well aware of this situation. Conclusion

Demand for HPC is growing robustly, but to exploit this demand, the HPC community needs to address a daunting set of requirements — larger system sizes (soon to include exascale), mixes of compute-intensive and data-intensive workloads, newer environments (cloud computing), improved energy efficiency and reliability/resiliency, better command and control of all this functionality via the software stack, and alleviation of the data movement (I/O) bottleneck via better interconnect fabrics. In recent years, reference architectures — flexible master blueprints — have arisen to help developers create HPC systems that are integrated, performant, and responsive to these complex requirements.

These architectures address part of the challenge, but not all of it. Intel Scalable System Framework ventures beyond reference architectures. It aims to provide a foundational direction for building HPC systems that deliver adequate balance, flexibility, and sustained performance for the challenging HPC workloads of today and tomorrow. Intel has already established robust momentum for Intel SSF, with a solid and growing list of OEM partners and a contract to use Intel SSF to build Aurora, a DOE-funded system that is the most powerful supercomputer ever announced.

The Intel SSF initiative contains many moving parts, each with its own complexity and challenges — processors, an interconnect fabric, memory, system software, and more. IDC observes that Intel has evolved in recent years from processor-centric thinking to system-level thinking. We believe that this evolution positions Intel well to make Intel SSF a strong success.

ABOUT THIS PUBLICATI ON

This publication was produced by IDC Custom Solutions. The opinion, analysis, and research results presented herein are drawn from more detailed research and analysis independently conducted and published by IDC, unless specific vendor sponsorship is noted. IDC Custom Solutions makes IDC content available in a wide range of formats for distribution by various companies. A license to distribute IDC content does not imply endorsement of or opinion about the licensee.

COPYRIGHT AND RESTRI CTIONS

Any IDC information or reference to IDC that is to be used in advertising, press releases, or promotional materials requires prior written approval from IDC. For permission requests, contact the IDC Custom Solutions information line at 508-988-7610 or [email protected]. Translation and/or localization of this document require an additional license from IDC.

For more information on IDC, visit www.idc.com. For more information on IDC Custom Solutions, visit http://www.idc.com/prodserv/custom_solutions/index.jsp.

Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200 F.508.935.4015 www.idc.com

©2016 IDC 7