Linux64 Im CAE Umfeld
Silicon Graphics, Inc.
Linux64Linux64 imim CAECAE UmfeldUmfeld
3.3. LSLS--DYNADYNA ForumForum
14.-15. Oktober 2004 in Bamberg
Josef Hellauer [email protected] AgendaAgenda
• HPC im CAE-Bereich
• Systemanforderung CAE
• Linux64-Server SGI Altix
LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 2 CAECAE ApplicationApplication SegmentsSegments inin AutomotiveAutomotive 20032003
-Crash is the application segment #1 Others Mfg Processes 0% CFD 2% -The fastest growing 20% application is CFD, won 3% shares from 2002
Structural -NVH is the third application 8% Crash segment. However, it is the 58% NVH most demanding in terms of 12% memory and IO bandwidth.
-10 CAE Applications drive 89% of the installed compute power CFD - Computational Fluid Dynamics NHV - Noise, Vibration, Harshness Source: Top20Auto, Ch.Tanasescu, SC’2003, www.top500.org MFG Processes - Stamping, Casting, Forging http://www.top500.org/lists/2003/11/Top20Auto_Top500V2.pdf
LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 3 KeyKey CAECAE ApplicationsApplications inin AutomotiveAutomotive
Powerflow FireVectis LS-Dyna 6,3% 1,2%1,1% Nastran 12000 Abaqus LS-Dyna Pam-Crash 7,0% 23,3% Fluent Radioss 10000 7,4% StarCD 79% Nastran Fluent 8000 StarCD growth 11,5% 15,8% Abaqus Radioss Powerflow Pam-Crash 6000 7,1% Fire 19,4% Vectis 4000
Installed GFlops Installed 2002 2000 2003 11600 7060 6940 4861 2974 2943 2429 537 528 0 3068
a n h n as r tra ent ss s S-Dy s u u ow re is L Na StarCD Fl dio aq fl Fi t a b er ec Pam-C R A w V Other applications: Marc, Ansys, Madymo, AMLS, Pam-Stamp,Po Magma, Autoform, Permas Source: Top20Auto, Ch.Tanasescu, SC’2003, www.top500.org http://www.top500.org/lists/2003/11/Top20Auto_Top500V2.pdf
LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 4 SystemSystem ArchitecturesArchitectures inin AutomotiveAutomotive WorldWorld WideWide -- 20032003
Ex: Cluster of IA32 ( 1-2p per node) 62% are SMP based systems
CLUSTER 33% SMP 35% Ex: SGI Altix 3000, IBM p690, HP Superdome, NEC SX-6, SGI Origin3000
MPP 5% CONSTELLATION 27% Ex: Cluster of SMP nodes like Ex: IBM SP3, SGI Altix 3000, IBM p690, CPQ E45, Fujitsu VPP5000 HP Superdome, SGI Origin3000 Notes CONSTELLATION - cluster of SMP nodes Source: Top20Auto, Ch.Tanasescu, SC’2003, www.top500.org CLUSTER - cluster of IA32 (1 or 2p) http://www.top500.org/lists/2003/11/Top20Auto_Top500V2.pdf
LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 5 AA HistoryHistory ofof InnovationInnovation ofof SGISGI
Challenge® XL media server fuels Steven Spielberg’s Shoah NASA Ames and project to document Altix® set world Power Series™, Holocaust survivor record for multi-processing stories STREAMS systems provide Jim Clark benchmark compute power First systems founded SGI on SGI introduces for high-end deployed in Stephen the vision of its first 64-bit graphics Hawking’s COSMOS Altix®, first scalable Computer operating applications system 64-bit Linux® Server Visualization system
1982 1984 1988 1994 1995 1996 1997 1998 2001 2003 2004
DOE deploys 6144p Introduced First generation Origin 2000 to IRIS® Workstations modular NUMA System: monitor and NUMAflex™ become first integrated Origin® 2000 simulate nuclear 3D graphics systems architecture stockpile with Origin® 3000
First 512p Altix Dockside engineering analysis on Origin® cluster drives ocean 2000 and Indigo2™ helps Team New research at NASA Zealand win America’s Cup Ames
Images courtesy of Team New Zealand and the University of Cambridge LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 6
R SGISGI FamilyFamily ofof ScalableScalable LinuxLinux® SolutionsSolutions SGI® Altix® 350 SGI® Altix® 3000 Servers and Clusters Servers and Superclusters
Other vendors’ Linux® Solutions
Image courtesy: NASA Ames
Desktops Edge servers Mid-range Divisional Capability
LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 7
AA ModularModular SGISGI® NUMAflexNUMAflex™™ ArchitectureArchitecture System Performance:Performance: High-bandwidthHigh-bandwidth Interconnect interconnectinterconnect withwith veryvery lowlow latencylatency
Flexibility:Flexibility: TailoredTailored configurationsconfigurations forfor CPU & differentdifferent dimensionsdimensions ofof scalabilityscalability Memory InvestmentInvestment protection:protection: AddAdd newnew technologiestechnologies asas theythey evolveevolve System I/O Scalability:Scalability: NoNo centralcentral busbus oror switch;switch; justjust Standard I/O modulesmodules andand NUMAlink™NUMAlink™ cables cables Expansion High-Bandwidth I/O Expansion
Graphics Expansion
Storage Expansion
LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 8 SiliconSilicon GraphicsGraphics PrismPrism™™ VisualizationVisualization SystemSystem InnovationInnovation thatthat SCALESSCALES
“Power” “Team” “Extreme” Power User Small-Group Ultimate Visualization Visualization Visualization Capability $ $75k - $250k $200k - $2,000k 2 - 4 GPUs, 2 - 8 CPUs 4-8 GPUs, 8-16 CPUs 8-16 GPUs, 16-32 CPUs
Delivers Delivers productivity Delivers ultimate advantage breakthroughs
LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 9
Image courtesy of Laboratory for Atomospheres, NASA Goddard Space Flight Center SiliconSilicon GraphicsGraphics PrismPrism™™ VisualizationVisualization System ArchitectureSystem Makes A Difference SGI® NUMAflex™ independently scales all resources
Globally Addressable Memory C o m p o s Large I t Data Scalable Scalable o r Compute Graphics Sets N e t w o r k Scalable Software Development Scene Graphs, Toolkits and Transparent Scaling
LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 10 SGISGI AltixAltix 37003700 andand 350350 ServersServers WithWith GlobalGlobal AddressableAddressable MemoryMemory
• Supercomputer, supercluster configurations • SSI up tp 512 processors, • Supercluster up to 2048p via NUMAlink Image courtesy: NASA Ames • Up to 8TB of shared memory • 6.4GB/sec dual plane fat tree → 12.8GB/sec with NUMAlink 4 router • 1.3GHz / 3.0MB to 1.6GHz / 9.0MB Intel Itanium2 processors Altix™ 3700 • Infinitely scalable using commercial interconnects
Partition 1 Altix™ 350 Partition 2
LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 11 InfinibandInfiniband®®/Altix/Altix 350350 ClustersClusters
• Competitive advantages: – large host size (16 P) – SGI ProPack software features and support – MPT MPI
Infiniband Switch
LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 12 SGISGI ProductionProduction LinuxLinux SolutionSolution StackStack
™ SGI ProPack System management: Partitioning (Altix 3000), Performance Co-Pilot HPC Value-Add Enhancements Resource management: CPU sets/memory placement, MPT, array services, SCSL math libraries
Optimized high-productivity Data management: XVM computing
® SGI Open-Source Contributing SGI expertise in scalability, NUMA Enhancements XFS® high-performance filesystem
Enabling features and functionality
® Standard Linux Red Hat® Enterprise Linux® AS 3.0* compatible for highest performance Distribution Certified SUSE Linux 9 option for highest certified support levels
Runs standard 64-bit Linux Easy to develop and administer applications Intel® compilers and tools
LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 13 HPCHPC ResourceResource DemandsDemands forfor MCAEMCAE
Generally Five Types of MCAE Computational Behavior to Consider::
MCAE Segment Software System Resource Benefits/Requirements CPU Memory BW I/O Bandwidth Latency Scalability ABAQUS® IFEA ANSYS® Statics MSC.NASTRAN™ HH M LL< 10p
ABAQUS® IFEA ANSYS® Dynamics MSC.NASTRAN™ M HHHL < 10p
LS-DYNA® …. EFEA ….. HH MM LL MM HH ~~ 64p64p
FLUENT® CFD STAR-CD™ Unstructured PowerFLOW® M H M H H ~ 100p
CFD FLOWER Structured OVERFLOW HH L MM> 100p
LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 14 LDLD--DynaDyna mpp970mpp970 CommunicationCommunication MatrixMatrix
1D LS-Dyna Profile for 30 CPUs - Total MBytes
28
26
24
22
25-30 20 20-25 18 2D 15-20 16 Receiver 10-15 14 12 5-10 10 0-5 8
6
4
2
0 3D 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 Sender
90% or messages shorter than 3kbyte , Latency is important for scalability
LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 15 InterconnectInterconnect ContentionContention dependsdepends onon TopologyTopology MPI Communication trace
Communication time slices BW -- Topo A Topo B BW ++ Interconnection Topology concurrent communications link saturation ?
LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 16 SScacallaablblee CPU/MCPU/Memoryemory IInntterconnecerconnectt FabricFabric 128-Processor163264 Processor System System
C-Brick C-Brick C-Brick C-Brick C-Brick C-Brick C-Brick C-Brick C-Brick C-Brick Scalable C-Brick C-Brick Compute C-Brick C-Brick and C-Brick C-Brick C-Brick C-Brick C-Brick Large C-Brick C-Brick C-Brick Memory C-Brick C-Brick C-Brick C-Brick C-Brick C-Brick C-Brick C-Brick Bisection C-Brick C-Brick Bandwidth Scalable Compute S h a r e d M e m o r y
Total system bisection bandwidth actually scales as system grows!
LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 17 SkalierbareSkalierbare LeistungLeistung imim VergleichVergleich
PC Cl ust er 3.06GHz Alt ix 1.3GHz El apsed Ti me El apsed Ti me PCcluster/ Altix Scalability Scalability CPUs sec hours sec hours 4 56479 15:41:19 1.00 48611 13:30:11 1.00 1.16 8 30461 8:27:41 1.85 25463 7:04:23 1.91 1.20 16 18157 5:02:37 3.11 13012 3:36:52 3.74 1.40 32 14610 4:03:30 3.87 7220 2:00:20 6.73 2.02 64 13162 3:39:22 4.29 4456 1:14:16 10.9 2.95 PC Clust er 3.06GHz Altix 1.3GHz PC Clust er 3.06GHz Altix 1.3GHz 60000 13.00 ) 50000 11.00 sec ( 40000 y 9.00 30000 7.00 Scalabilit sed Time20000 5.00 p
Ela 10000 3.00 1.00 0 4 8 16 32 64 CPUs 4244464 CPUs
LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 18 Beispiel:Beispiel: LSLS--DYNADYNA StampingStamping > 800.000 ElementeSimulation Time = 21 msec
Altix 3700, 1500MHz, MPP970 Altix 3700, 1300MHz, MPP970
6,8 XX Itanium2, 1.5 GHz 16 7,9 Athlon, 2.2 GHz, Myrinet 10,1
22,8 CPUs 11,8
13,8 8 16,7
0 4 8 12 16 20 24 Elapsed Times Total [h]
LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 19 Beispiel:Beispiel: LSLS--DYNADYNA StampingStamping > 800.000 ElementeSimulation Time = 21 msec
Performance Relation vs. [email protected], Athlon
3,5 Altix 3700, 1500MHz, MPP970 3,3 Altix 3700, 1300MHz, MPP970 3,0 XX Itanium2, 1.5 GHz 2,9 Athlon, 2.2 GHz, Myrinet 2,5 2,3 2,0 1,9
1,6 1,5 1,4 1,0 1,0 4 8 12 16
LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 20 SGISGI® AltixAltix™™ undund ccNUMAccNUMA imim CAECAE
Der 64-bit Adressraum der Itanium®2 CPUs erlaubt CAE-Anwendungen mit Speicherbedarf >> 4 GB • kurzfristige Erweiterung (höhere Diskretisierungen, größere Modelle) des virtuellen und physikalischen Speichers ohne Einschränkungen und Systemwechsel (HW+SW), • die performante Nutzung von In-Core-Solver >> 4 GB • das Testen und Debuggen von „Problem-Modellen“ auf 1 CPU im SMP und MPP-Modus: Hohe I/O-Bandbreiten + SGI-Filesystem XFS beschleunigen Out-of-Core Solver im NVH-Bereich. Numaflex + optimierte OS-System Funktionen ergeben beste Durchsatzleistungen (s. SPECfp_rate2000)
LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 22 NewNew DiscoveriesDiscoveries withwith NASANASA ECCOECCO EstimatingEstimating thethe CirculationCirculation andand ClimateClimate ofof thethe OceanOcean The Challenge The Solution • Assimilate real-time • 512p Altix®, 28TB satellite data directly into storage into global ocean • Reducing typical circulation models ocean simulation • Drive resolution from times from months to today’s .25 degree 2-3 days (~25km) to .1 degree •World Record STREAM Triad benchmark result, first system to break 1,000 GB/sec
LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 23
AA OptimizingOptimizing DecisionsDecisions atat ŠŠkodakoda AutoAuto
Škoda Auto Mladá Boleslav, Czech Republic The Challenge The Solution Reduce processing • Deployed 96p Altix® times for crash analysis 3700 cluster, 16TB and fluid dynamics InfiniteStorage disk analysis array • Reduced processing times allow Skoda to optimize decisions around car development and design efficiency, quality and safety
Images courtesy of Škoda Auto LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 24
AA AltixAltix®:: BreakthroughBreakthrough DeploymentsDeployments
Breakthrough Chemistry and Biology Oak Ridge National Lab, University of Minnesota Gaussian, Amber, Blast, Charm, Supercomputing Institute, National Cancer applications Accelrys and more Institute, many universities Manufacturing DaimlerChrysler,VW, Skoda, many more LS-Dyna, Abaqus, Star-CD, NASTRAN, Ansys and more
Physical Sciences COSMOS, NASA, University of Queensland, Modeling of oceans, weather, Total, Japan Institute of Statistical Mathematics, earth, astronomy, energy, and many more more Large shared memory Oak Ridge National Labs (2TB), Japan Institute Breakthrough of Statistics and Mathematics (1.96TB), NRL technologies (1TB), and others Large processor scalability NASA Ames (512p), NRL (384p), SARA (416p), Total (552p), and others Grid computing COSMOS, Netherlands National Supercomputing Facility at SARA, and others Hybrid solutions Wichita State University, BP, and others ProvenImage courtesy: NASA Ames Early adopters expanding Total, NASA Ames, NRL, and others performance their deployments
Images courtesy of the University of Cambridge, Molecular Simulations Inc., Skoda Auto, and NASA Ames LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 25 ZusammenfassungZusammenfassung
• Leistungsfähigstes Linux® Computersystem der Welt • Intel® Itanium®2 CPUs mit SGI® NUMAflex™ Architektur • Single System Image mit bis zu 512 CPUs • Shared-Memory Skalierbarkeit bis zu 8TB • MPI mit Shared Memory Features • Linux Erweiterungen für HPC Workflow optimiert • Höchste Bandbreite und kürzeste Latenzzeiten • Standard Linux Distribution und Entwicklungsplatform • Differenzierte “Middleware” und Treiber • Hohe Verfügbarkeit von 64bit Anwendungen • Integration von skalierbarer Graphik möglich
LS-DYNA Forum, Bamberg 14. - 15. Oktober 2004 26