Scaling Up Engineering Analysis using Windows HPC Server 20008 Agenda

High Productivity for HPC

HPC for ANSYS

Benchmarks & Case Studies

Discussion X64 Server Why Clusters?

• Clusters enable bigger simulations (more accuracy) – Typical FEA models require 4-8 GB RAM • Today’s more demanding models might consume 100 GB – Typical CFD models require 2-10 GB RAM • Today’s more demanding models might consume 50-100 GB • Clusters deliver turnaround time – Clusters can yield nearly linear scaling • A typical simulation might run for 8 hours on a single CPU →Becomes (ideal case) 1 hour on an eight-core cluster • Clusters can scale up with demand – More simulations (more design options) • Optimization studies might require 50-100 simulations High Productivity Computing

HPC and IT data centers merge, streamlined cluster management Combined infrastructure

Users with broad access to multiple cores and servers

Integrated desktop and HPC environment

Simplified parallel development

Unified development environment ’s Vision for HPC

“Provide the platform, tools and broad ecosystem to reduce the complexity of HPC by making parallelism more accessible to address future computational needs.”

Reduced Complexity Mainstream HPC Broad Ecosystem

Ease deployment for Address needs of traditional Increase number of parallel larger scale clusters supercomputing applications and codes

Address emerging Offer choice of parallel Simplify management for cross-industry development tools, clusters of all scale computation trends languages and libraries Drive larger universe of Integrate with Enable non-technical users to end-users, developers, existing infrastructure harness the power of HPC and system administrators Windows HPC Server 2008

www.microsoft.com/hpc Windows HPC Server 2008

• Complete, integrated platform for HPC Clustering • Built on top 2008 64-bit Operating System • Addresses the needs of traditional and emerging HPC

Windows Server 2008 Microsoft HPC Pack HPC HPC Edition 2008 Server 2008

• Secure, • Job Scheduler • Integrated Solution Reliable, Tested • Resource Manager out-of-the-box • Support for high • Cluster Management • Leverages investment in performance hardware • Message Passing Interface Windows administration (x64, high-speed and tools interconnects) • Makes cluster operation easy and secure as a single system Spring 2008, NCSA, #23 9472 cores, 68.5 TF, 77.7%

Spring 2008, Umea, #40 5376 cores, 46 TF, 85.5%

Spring 2008, Aachen, #100 2096 cores, 18.8 TF, 76.5%

Fall 2007, Microsoft, #116 2048 cores, 11.8 TF, 77.1%

30% efficiency improvement Windows HPC Server 2008

Spring 2007, Microsoft, #106 Windows Compute Cluster 2003 2048 cores, 9 TF, 58.8%

Spring 2006, NCSA, #130 896 cores, 4.1 TF November 2008 Top500

List or Heat Map view cluster at a glance Group compute nodes based on hardware, software and custom attributes; Act on groupings.

Receive alerts for failures

Track long running operations and access operation history

Pivoting enables correlating nodes and jobs together

Simulation Driven Product Development

Engineering simulation software for product development  Innovative & higher-quality products  Dramatic time-to-market improvement  Minimize development, warranty & liability costs CAE Simulation Process

Computationally Intensive Interactive Desktop Process or Client Process (HPC Server 2008) OPTIMIZE Parameter variations Multiple design options UNDERSTAND Visualization COMPUTE Post-processing Structural analysis CONCEIVE Fluid dynamics Concept design Fluid-structure interaction Component modeling Electromagnetics Parameterization and description of variables and objectives High Performance Computing Drivers

• Bigger Simulations – More geometric detail – More complex physics Memory (lots) • More Simulations Capacity (more) – On time insight – Multiple design options Speed (more) – Automated optimization Data Management (lots) Microsoft – ANSYS Joint Value

To increase compute capacity so that engineers, product designers, and performance evaluators can generate high- fidelity simulation results faster and more economically

To help IT groups leverage in- house skills, existing technologies, and familiar admin/management interfaces by integrating HPC with their Windows infrastructures Customer Success: Spraying Systems Co.

• Deployed Windows HPC with FLUENT complex simulation of industrial spray systems. Achieved shorter run times for simulations which had been tying up engineering workstations • Achieved 12X increase in computation speed • Sample run time reduced from 192 hours to 16 hours • Enabled more detailed, accurate simulations • Freed up workstations for setup Leveraged existing Windows infrastructure and expertise

“Using a dual or quad core workstation is fine for smaller simulations, but for complex, extensive simulations event multi-core workstations couldn’t complete the computations in a reasonable timeframe.” - Rudolf Schick, Vice President, Spray Analysis & Research Services Customer Success: Petrobras

• Deployed Windows HPC with ANSYS CFX and FLUENT for upstream and downstream applications. Engineers running 5-10 simultaneous simulations using 8-30 core per job. – Improved productivity time for research team – Simpler, more centralized support of clusters

“With Windows HPC, setup time has decreased from several hours – or even days for some clusters – to just a few minutes, regardless of cluster size.” IT Manager, Petrobras CENPES Customers

“It is important that our IT environment is easy to use and support. Windows HPC is improving our performance and manageability.”

-- Dr. J.S. Hurley, Senior Manager, Head Distributed Computing, Networked Systems Technology, The Boeing Company

“Ferrari is always looking for the most advanced technological solutions and, of course, the same applies for software and engineering. To achieve industry leading power-to-weight ratios, reduction in gear change times, and revolutionary aerodynamics, we can rely on Windows HPC Server 2008. It provides a fast, familiar, high performance computing platform for our users, engineers and administrators.”

-- Antonio Calabrese, Responsabile Sistemi Informativi (Head of Information Systems), Ferrari

“Our goal is to broaden HPC availability to a wider audience than just power users. We believe that Windows HPC will make HPC accessible to more people, including engineers, scientists, financial analysts, and others, which will help us design and test products faster and reduce costs.”

-- Kevin Wilson, HPC Architect, Procter & Gamble Windows HPC Server 2008 Support

• ANSYS R12 supports Windows HPC Server 2008 – ANSYS Mechanical, ANSYS FLUENT, ANSYS CFX • Builds on success with ANSYS R11 and Windows CCS – Full support for job scheduler and MS-MPI – Improved control: Map processes to nodes, sockets, and cores – More performance: Greatly improved parallel scaling based on network direct MPI Job Scheduler Support – ANSYS Workbench

• Allocates the necessary resources to the simulations • Tracks the processors associated with the job • Deallocates the resources when the simulation finishes Performance Gain – ANSYS Mechanical

Comparision CCS1 w R11 vs. CCS2 w R12 ANSYS R11 on WCCS vs. ANSYS R12 on Windows HPC Server 2008

5.00

4.50

4.00 6 -

BMD 3.50

3.00

2.50 Elapsed Time Speed - Speed Time Elapsed 2.00 CSS2 w R12

1.50 CCS1 w R11

1.00 0 2 4 6 8 10 12 Number of Cores ANSYS FLUENT Public Benchmarks

250

200

150

Windows 100 Linux Fluent Rating (Higher better) is (Higher Rating Fluent 50

0 2 4 8 16 32 64 Number of CPU Cores

Conclusions: Reference: Dataset is External Flow over a Truck Body, 14M For ANSYS FLUENT 12.0, Windows HPC cells, public benchmark. ANSYS FLUENT 12.0 data posted at Server 2008 is delivering the same or faster http://www.fluent.com/software/fluent/fl6bench/new.htm performance than Linux on the same Windows HPC data run with pre-release ANSYS FLUENT hardware (HP BL2x220 3GHz with 16GB 12.0.16. Results run by HP and submitted/approved by ANSYS. memory per node and InfiniBand interconnect). Performance Gain - FLUENT

Parallel Scaling Improvement on Windows • Combined OS and HPC Server 2008 software improvements yield 24% improvement 800 FLUENT 12.0.5 - at 32-way parallel 600 HPC Server 2008

400 FLUENT 6.3 -

Rating Window s CCS 200 2003

0 0 8 16 24 32 Number of Core

Exterior flow around a passenger sedan, 3.6M cells turbulence , pressure-based coupled solver Performance Gain – Windows HPC Server 08

FLUENT 6.3 - Operating System Comparison

800 • FLUENT 6.3 shows 700 600 20% improvement at 500 WCCS 2003 400 HPC Server 2008 32 core on Windows Rating 300 200 100 HPC Server 2008 0 0 4 8 12 16 20 24 28 32 Number of Core • Conclusion: – HPC Server 2008

Exterior flow around a passenger sedan, 3.6M cells yields significant turbulence , pressure-based coupled solver gains! Performance Scale-Out

Performance Tuning in FLUENT 12 - Windows HPC • FLUENT 12 showing near-ideal Server 2008 (GigE) 4000

scaling in tests to 64-core (GigE 3000 FLUENT 6.3 FLUENT 12.0.5 network) 2000 Rating IDEAL 1000 • FLUENT 12 and FLUENT 6.3 0 0 8 16 24 32 40 48 56 64 show near-ideal scaling in tests Number of Core

to 128 core (IB network) Performance Tuning in FLUENT 12 - Windows HPC Conclusion: Server 2008 (IB) • 15000

– GigE scale-out improved with 10000 FLUENT 6.3.33 FLUENT 12.0.5

FLUENT 12 Rating 5000 IDEAL – IB scale-out now excellent to 128 0 0 64 128 192 256 core with FLUENT 6.3 or FLUENT Number of Core 12 Exterior flow around a passenger sedan, • And 77% of ideal at 256 core! 3.6M cells Windows HPC Server 2008 vs. Linux

Windows HPC Server 2008 vs. Linux Dell PowerEdge SC 1435 (Dual-Core Opteron, IB)

• Direct comparison of 2500 2000 RHEL AS 4 1500 performance on same Windows HPC 1000 Rating Server 2008 hardware 500 0 0 16 32 48 64 • Windows solution is Number of cores

within 6% of Linux Windows HPC Server 2008 vs. Linux data on this cluster Dell PowerEdge SC 1435 (Dual-Core Opteron, IB) 60 50 (at 64 core) Windows HPC Server 40 2008 30 RHEL AS 4 20 Conclusion: Speedup • 10 0 – Windows vs. Linux 0 16 32 48 64 performance is very Number of cores comparable Exterior flow around a passenger sedan, 3.6M cells turbulence , pressure-based coupled solver

Data courtesy of Dell, Inc. and X-ISS, Inc. Sizing a Cluster – ANSYS Mechanical

• Typical cluster sizes for ANSYS will use 4-8 core per simulation at R11 (8-16+ for R12) – Typically smaller numerical workloads tend to diminished scaling beyond this point

• Total RAM on the cluster will determine maximum model size – Typically, double the RAM on the head node (“core 0”)

• I/O configuration can be a limiting factor – Significant I/O during the computations, on the compute nodes How many cores?

Number of simultaneous CFD model size (number of Cluster Size CFD simulations on the cells) (Number of CPUs or cores) cluster 1 Up to 2-3M 4 2 Up to 2-3M 8 1 Up to 4-5M 8 1 Up to 8-10M 16 2 Up to 4-5M 16 1 Up to 16-20M 32 2 Up to 8-10M 32 4 Up to 4-5M 32 1 Up to 30-40M 64

1 Up to 70-100M 128 How much RAM?

• Recommended: 2GB/core (e.g., 8GB per dual processor dual- core node) • Total memory requirement increases linearly with the CFD model size:

CFD Model Size RAM Requirement Up to 2M cells 2 GB Up to 5M cells 5 GB Up to 50M cells 50GB (and continued linear relation as the problem size increases) HP 16-core Cluster for ANSYS

Processor: Four DL380 G5 Woodcrest Xeon-based server nodes, each dual processor, dual core (2p/4c*)  Four cores per compute node DL380 G5 Woodcrest  Node 1 doubles as the head node

Total Memory for Cluster: up to 80GB RAM  8GB/core or 32GB total on Head node  2 or 4GB/core (8 or 16GB/node) on each 3 remaining compute nodes

 Storage: Two 72GB SAS drives striped RAID0 on 3 compute nodes plus 5x72GB SAS RAID0 disk array on head node

Interconnect: GigE cluster switch; 100BT management switch

 Operating Environment: 64-bit Windows HPC Server 2008

Workloads:  56 - 80GB RAM configurations will handle ANSYS “megamodels” of 50M DOF

*2 processors, 4 cores total per node=dual core

HP Confidential HP 32-core BladeSystem for CFD

Processor Options:  Up to 8 Xeon (BL460c) or Opteron (BL465c) Total Memory for the Cluster: 64 – 88GB RAM compute nodes, each dual processor, dual core 2GB/core (8GB/node) on compute nodes (2p/4c*). and head node.  4 cores per compute node 8GB/core (32GB on node 1) if using head  72GB SAS drive node for Pre/post processing Option to configure one node for storage  Node 1 can be used as the head node Interconnect: Integrated Gigabit Ethernet or and/or pre/post processing node. Two 72GB SAS InfiniBand DDR, and management network drives suitable for head node.  Storage: Optional SB40c storage blade Extended direct attached storage on the head node Up to 6 SFF SAS drives

 64-bit Windows HPC Server 2008

 Ideally suited for FLUENT or CFX models up to 50M cells. Or, run 3-4 simultaneous fluids models on the scale of 10M cells

*2 processors, 4 cores total per node=dual core

HP Confidential Microsoft HPC in the Future

Parallel Extensions Futures

Personal Broad Seamless Super Computing Reaching HPC Parallelism  Microsoft Entry into HPC  Support Traditional & Emerging  Parallel Computing Everywhere  Personal And Workgroup HPC  Ultra-Scale/Cloud Computing Technical Computing  Larger Cluster support &  Transparent User Access  End User Applications Top500 Range  Implicit parallelism for .NET available for Windows  Greater Accessibility for developers  Parallel and HPC Development Windows-based Users  Dynamic and Virtualized Tools  Broader Developer support with workloads  Ease of Management and tools and SOA  Mainstream Management of Deployment  Improved Management and HPC and IT Infrastructure Deployment Additional Information

• Microsoft HPC Web site – Evaluate Today! – http://www.microsoft.com/hpc • Windows HPC Community site – http://www.windowshpc.net • Windows HPC TechCenter – http://technet.microsoft.com/en-us/hpc/default.aspx • HPC on MSDN – http://code.msdn.microsoft.com/hpc • Windows Server Compare website – http://www.microsoft.com/windowsserver/compare/default.mspx Questions?

• Let us know how Microsoft and your local ANSYS representative can help you plan and provide the computing resources you need for simulation. Taking HPC Mainstream © 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.