Quick viewing(Text Mode)

Linux64 Im CAE Umfeld

Linux64 Im CAE Umfeld

Silicon , Inc.

Linux64Linux64 imim CAECAE UmfeldUmfeld

3.3. LSLS--DYNADYNA ForumForum

14.-15. Oktober 2004 in Bamberg

Josef Hellauer [email protected] AgendaAgenda

• HPC im CAE-Bereich

• Systemanforderung CAE

• Linux64-Server SGI

LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 2 CAECAE ApplicationApplication SegmentsSegments inin AutomotiveAutomotive 20032003

-Crash is the application segment #1 Others Mfg Processes 0% CFD 2% -The fastest growing 20% application is CFD, won 3% shares from 2002

Structural -NVH is the third application 8% Crash segment. However, it is the 58% NVH most demanding in terms of 12% memory and IO bandwidth.

-10 CAE Applications drive 89% of the installed compute power CFD - Computational Fluid Dynamics NHV - Noise, Vibration, Harshness Source: Top20Auto, Ch.Tanasescu, SC’2003, www..org MFG Processes - Stamping, Casting, Forging http://www.top500.org/lists/2003/11/Top20Auto_Top500V2.pdf

LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 3 KeyKey CAECAE ApplicationsApplications inin AutomotiveAutomotive

Powerflow FireVectis LS-Dyna 6,3% 1,2%1,1% Nastran 12000 Abaqus LS-Dyna Pam-Crash 7,0% 23,3% Fluent Radioss 10000 7,4% StarCD 79% Nastran Fluent 8000 StarCD growth 11,5% 15,8% Abaqus Radioss Powerflow Pam-Crash 6000 7,1% Fire 19,4% Vectis 4000

Installed GFlops Installed 2002 2000 2003 11600 7060 6940 4861 2974 2943 2429 537 528 0 3068

a n h n as r tra ent ss s S-Dy s u u ow re is L Na StarCD Fl dio aq fl Fi t a b er ec Pam-C R A w V Other applications: Marc, Ansys, Madymo, AMLS, Pam-Stamp,Po Magma, Autoform, Permas Source: Top20Auto, Ch.Tanasescu, SC’2003, www.top500.org http://www.top500.org/lists/2003/11/Top20Auto_Top500V2.pdf

LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 4 SystemSystem ArchitecturesArchitectures inin AutomotiveAutomotive WorldWorld WideWide -- 20032003

Ex: Cluster of IA32 ( 1-2p per node) 62% are SMP based systems

CLUSTER 33% SMP 35% Ex: SGI Altix 3000, IBM p690, HP Superdome, NEC SX-6, SGI Origin3000

MPP 5% CONSTELLATION 27% Ex: Cluster of SMP nodes like Ex: IBM SP3, SGI Altix 3000, IBM p690, CPQ E45, Fujitsu VPP5000 HP Superdome, SGI Origin3000 Notes CONSTELLATION - cluster of SMP nodes Source: Top20Auto, Ch.Tanasescu, SC’2003, www.top500.org CLUSTER - cluster of IA32 (1 or 2p) http://www.top500.org/lists/2003/11/Top20Auto_Top500V2.pdf

LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 5 AA HistoryHistory ofof InnovationInnovation ofof SGISGI

Challenge® XL media server fuels Steven Spielberg’s Shoah NASA Ames and project to document Altix® set world Power Series™, Holocaust survivor record for multi-processing stories STREAMS systems provide Jim Clark benchmark compute power First systems founded SGI on SGI introduces for high-end deployed in Stephen the vision of its first 64-bit graphics Hawking’s COSMOS Altix®, first scalable Computer operating applications system 64-bit ® Server system

1982 1984 1988 1994 1995 1996 1997 1998 2001 2003 2004

DOE deploys 6144p Introduced First generation Origin 2000 to IRIS® modular NUMA System: monitor and NUMAflex™ become first integrated Origin® 2000 simulate nuclear 3D graphics systems architecture stockpile with Origin® 3000

First 512p Altix Dockside engineering analysis on Origin® cluster drives ocean 2000 and Indigo2™ helps Team New research at NASA Zealand win America’s Cup Ames

Images courtesy of Team New Zealand and the University of Cambridge LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 6

R SGISGI FamilyFamily ofof ScalableScalable LinuxLinux® SolutionsSolutions SGI® Altix® 350 SGI® Altix® 3000 Servers and Clusters Servers and Superclusters

Other vendors’ Linux® Solutions

Image courtesy: NASA Ames

Desktops Edge servers Mid-range Divisional Capability

LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 7

AA ModularModular SGISGI® NUMAflexNUMAflex™™ ArchitectureArchitecture System Performance:Performance: High-bandwidthHigh-bandwidth Interconnect interconnectinterconnect withwith veryvery lowlow latencylatency

Flexibility:Flexibility: TailoredTailored configurationsconfigurations forfor CPU & differentdifferent dimensionsdimensions ofof scalabilityscalability Memory InvestmentInvestment protection:protection: AddAdd newnew technologiestechnologies asas theythey evolveevolve System I/O Scalability:Scalability: NoNo centralcentral busbus oror switch;switch; justjust Standard I/O modulesmodules andand NUMAlink™NUMAlink™ cables cables Expansion High-Bandwidth I/O Expansion

Graphics Expansion

Storage Expansion

LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 8 SiliconSilicon GraphicsGraphics PrismPrism™™ VisualizationVisualization SystemSystem InnovationInnovation thatthat SCALESSCALES

“Power” “Team” “Extreme” Power User Small-Group Ultimate Visualization Visualization Visualization Capability $ $75k - $250k $200k - $2,000k 2 - 4 GPUs, 2 - 8 CPUs 4-8 GPUs, 8-16 CPUs 8-16 GPUs, 16-32 CPUs

Delivers Delivers productivity Delivers ultimate advantage breakthroughs

LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 9

Image courtesy of Laboratory for Atomospheres, NASA Goddard Space Flight Center SiliconSilicon GraphicsGraphics PrismPrism™™ VisualizationVisualization System ArchitectureSystem Makes A Difference SGI® NUMAflex™ independently scales all resources

Globally Addressable Memory C o m p o s Large I t Data Scalable Scalable o r Compute Graphics Sets N e t w o r k Scalable Development Scene Graphs, Toolkits and Transparent Scaling

LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 10 SGISGI AltixAltix 37003700 andand 350350 ServersServers WithWith GlobalGlobal AddressableAddressable MemoryMemory

, supercluster configurations • SSI up tp 512 processors, • Supercluster up to 2048p via NUMAlink Image courtesy: NASA Ames • Up to 8TB of shared memory • 6.4GB/sec dual plane fat tree → 12.8GB/sec with NUMAlink 4 router • 1.3GHz / 3.0MB to 1.6GHz / 9.0MB Itanium2 processors Altix™ 3700 • Infinitely scalable using commercial interconnects

Partition 1 Altix™ 350 Partition 2

LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 11 InfinibandInfiniband®®/Altix/Altix 350350 ClustersClusters

• Competitive advantages: – large host size (16 P) – SGI ProPack software features and support – MPT MPI

Infiniband Switch

LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 12 SGISGI ProductionProduction LinuxLinux SolutionSolution StackStack

™ SGI ProPack System management: Partitioning (Altix 3000), Performance Co-Pilot HPC Value-Add Enhancements Resource management: CPU sets/memory placement, MPT, array services, SCSL math libraries

Optimized high-productivity Data management: XVM computing

® SGI Open-Source Contributing SGI expertise in scalability, NUMA Enhancements XFS® high-performance filesystem

Enabling features and functionality

® Standard Linux Red Hat® Enterprise Linux® AS 3.0* compatible for highest performance Distribution Certified SUSE Linux 9 option for highest certified support levels

Runs standard 64-bit Linux Easy to develop and administer applications Intel® and tools

LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 13 HPCHPC ResourceResource DemandsDemands forfor MCAEMCAE

Generally Five Types of MCAE Computational Behavior to Consider::

MCAE Segment Software System Resource Benefits/Requirements CPU Memory BW I/O Bandwidth Latency Scalability ABAQUS® IFEA ANSYS® Statics MSC.NASTRAN™ HH M LL< 10p

ABAQUS® IFEA ANSYS® Dynamics MSC.NASTRAN™ M HHHL < 10p

LS-DYNA® …. EFEA ….. HH MM LL MM HH ~~ 64p64p

FLUENT® CFD STAR-CD™ Unstructured PowerFLOW® M H M H H ~ 100p

CFD FLOWER Structured OVERFLOW HH L MM> 100p

LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 14 LDLD--DynaDyna mpp970mpp970 CommunicationCommunication MatrixMatrix

1D LS-Dyna Profile for 30 CPUs - Total MBytes

28

26

24

22

25-30 20 20-25 18 2D 15-20 16 Receiver 10-15 14 12 5-10 10 0-5 8

6

4

2

0 3D 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 Sender

90% or messages shorter than 3kbyte , Latency is important for scalability

LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 15 InterconnectInterconnect ContentionContention dependsdepends onon TopologyTopology MPI Communication trace

Communication time slices BW -- Topo A Topo B BW ++ Interconnection Topology concurrent communications link saturation ?

LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 16 SScacallaablblee CPU/MCPU/Memoryemory IInntterconnecerconnectt FabricFabric 128-Processor163264 Processor System System

C-Brick C-Brick C-Brick C-Brick C-Brick C-Brick C-Brick C-Brick C-Brick C-Brick Scalable C-Brick C-Brick Compute C-Brick C-Brick and C-Brick C-Brick C-Brick C-Brick C-Brick Large C-Brick C-Brick C-Brick Memory C-Brick C-Brick C-Brick C-Brick C-Brick C-Brick C-Brick C-Brick Bisection C-Brick C-Brick Bandwidth Scalable Compute S h a r e d M e m o r y

Total system bisection bandwidth actually scales as system grows!

LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 17 SkalierbareSkalierbare LeistungLeistung imim VergleichVergleich

PC Cl ust er 3.06GHz Alt ix 1.3GHz El apsed Ti me El apsed Ti me PCcluster/ Altix Scalability Scalability CPUs sec hours sec hours 4 56479 15:41:19 1.00 48611 13:30:11 1.00 1.16 8 30461 8:27:41 1.85 25463 7:04:23 1.91 1.20 16 18157 5:02:37 3.11 13012 3:36:52 3.74 1.40 32 14610 4:03:30 3.87 7220 2:00:20 6.73 2.02 64 13162 3:39:22 4.29 4456 1:14:16 10.9 2.95 PC Clust er 3.06GHz Altix 1.3GHz PC Clust er 3.06GHz Altix 1.3GHz 60000 13.00 ) 50000 11.00 sec ( 40000 y 9.00 30000 7.00 Scalabilit sed Time20000 5.00 p

Ela 10000 3.00 1.00 0 4 8 16 32 64 CPUs 4244464 CPUs

LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 18 Beispiel:Beispiel: LSLS--DYNADYNA StampingStamping > 800.000 ElementeSimulation Time = 21 msec

Altix 3700, 1500MHz, MPP970 Altix 3700, 1300MHz, MPP970

6,8 XX Itanium2, 1.5 GHz 16 7,9 Athlon, 2.2 GHz, Myrinet 10,1

22,8 CPUs 11,8

13,8 8 16,7

0 4 8 12 16 20 24 Elapsed Times Total [h]

LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 19 Beispiel:Beispiel: LSLS--DYNADYNA StampingStamping > 800.000 ElementeSimulation Time = 21 msec

Performance Relation vs. [email protected], Athlon

3,5 Altix 3700, 1500MHz, MPP970 3,3 Altix 3700, 1300MHz, MPP970 3,0 XX Itanium2, 1.5 GHz 2,9 Athlon, 2.2 GHz, Myrinet 2,5 2,3 2,0 1,9

1,6 1,5 1,4 1,0 1,0 4 8 12 16

LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 20 SGISGI® AltixAltix™™ undund ccNUMAccNUMA imim CAECAE

‰ Der 64-bit Adressraum der ®2 CPUs erlaubt CAE-Anwendungen mit Speicherbedarf >> 4 GB • kurzfristige Erweiterung (höhere Diskretisierungen, größere Modelle) des virtuellen und physikalischen Speichers ohne Einschränkungen und Systemwechsel (HW+SW), • die performante Nutzung von In-Core-Solver >> 4 GB • das Testen und Debuggen von „Problem-Modellen“ auf 1 CPU im SMP und MPP-Modus: ‰ Hohe I/O-Bandbreiten + SGI-Filesystem XFS beschleunigen Out-of-Core Solver im NVH-Bereich. ‰ Numaflex + optimierte OS-System Funktionen ergeben beste Durchsatzleistungen (s. SPECfp_rate2000)

LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 22 NewNew DiscoveriesDiscoveries withwith NASANASA ECCOECCO EstimatingEstimating thethe CirculationCirculation andand ClimateClimate ofof thethe OceanOcean The Challenge The Solution • Assimilate real-time • 512p Altix®, 28TB satellite data directly into storage into global ocean • Reducing typical circulation models ocean simulation • Drive resolution from times from months to today’s .25 degree 2-3 days (~25km) to .1 degree •World Record STREAM Triad benchmark result, first system to break 1,000 GB/sec

LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 23

AA OptimizingOptimizing DecisionsDecisions atat ŠŠkodakoda AutoAuto

Škoda Auto Mladá Boleslav, Czech Republic The Challenge The Solution Reduce processing • Deployed 96p Altix® times for crash analysis 3700 cluster, 16TB and fluid dynamics InfiniteStorage disk analysis array • Reduced processing times allow Skoda to optimize decisions around car development and design efficiency, quality and safety

Images courtesy of Škoda Auto LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 24

AA AltixAltix®:: BreakthroughBreakthrough DeploymentsDeployments

Breakthrough Chemistry and Biology Oak Ridge National Lab, University of Minnesota Gaussian, Amber, Blast, Charm, Supercomputing Institute, National Cancer applications Accelrys and more Institute, many universities Manufacturing DaimlerChrysler,VW, Skoda, many more LS-Dyna, Abaqus, Star-CD, NASTRAN, Ansys and more

Physical Sciences COSMOS, NASA, University of Queensland, Modeling of oceans, weather, Total, Japan Institute of Statistical Mathematics, earth, astronomy, energy, and many more more Large shared memory Oak Ridge National Labs (2TB), Japan Institute Breakthrough of Statistics and Mathematics (1.96TB), NRL technologies (1TB), and others Large processor scalability NASA Ames (512p), NRL (384p), SARA (416p), Total (552p), and others COSMOS, Netherlands National Supercomputing Facility at SARA, and others Hybrid solutions Wichita State University, BP, and others ProvenImage courtesy: NASA Ames Early adopters expanding Total, NASA Ames, NRL, and others performance their deployments

Images courtesy of the University of Cambridge, Molecular Simulations Inc., Skoda Auto, and NASA Ames LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 25 ZusammenfassungZusammenfassung

• Leistungsfähigstes Linux® Computersystem der Welt • Intel® Itanium®2 CPUs mit SGI® NUMAflex™ Architektur • Single System Image mit bis zu 512 CPUs • Shared-Memory Skalierbarkeit bis zu 8TB • MPI mit Shared Memory Features • Linux Erweiterungen für HPC Workflow optimiert • Höchste Bandbreite und kürzeste Latenzzeiten • Standard Linux Distribution und Entwicklungsplatform • Differenzierte “Middleware” und Treiber • Hohe Verfügbarkeit von 64bit Anwendungen • Integration von skalierbarer Graphik möglich

LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 26