<<

DepartmentDepartment ofof DefenseDefense HighHigh PerformancePerformance ComputingComputing ModernizationModernization ProgramProgram

Supercomputing:Supercomputing: CrayCray Henry,Henry, DirectorDirector HPCMP,HPCMP, PerformancePerformance44 May MayMeasuresMeasures 20042004 andand OpportunitiesOpportunities

CrayCray J.J. HenryHenry AugustAugust 20042004

http://http://www.hpcmo.hpc.milwww.hpcmo.hpc.mil 20042004 HPECHPEC ConferenceConference PresentationPresentation OutlineOutline zz WhatWhat’sWhat’’ss NewNew inin thethe HPCMPHPCMP 00NewNew hardwarehardware 00HPCHPC SoftwareSoftware ApplicationApplication InstitutesInstitutes 00CapabilityCapability AllocationsAllocations 00OpenOpen ResearchResearch SystemsSystems 00OnOn-demand-demand ComputingComputing zz PerformancePerformance MeasuresMeasures -- HPCMPHPCMPHPCMP zz PerformancePerformance MeasuresMeasures –– ChallengesChallengesChallenges && OpportunitiesOpportunities HPCMPHPCMP CentersCenters

19931993 20042004

Legend MSRCs ADCs and DDCs

TotalTotal HPCMHPCMPP EndEnd-of-Year-of-Year ComputationalComputational CapabilitiesCapabilities

80 MSRCs ADCs 120,000 70 13.1 MSRCs DCs 100,000 60 23,327 80,000 Over 400X Growth

50 s F Us 60,000 40 eak G HAB 12.1 P 21,759 30 59.3 40,000 77,676 5,8 6 0 20,000 4,393 30,770 20 2.6 21,946 1, 2 7 6 3,171 26.6 18 9 3 6 0 688 1, 16 8 2,280 8,03212 , 0 14 2.7 18 1 47 10 0 1,944 3,477 10 15.7 0 50 400 1,200 10.6 3 4 5 6 7 8 9 0 1 2 3 4

0 199 199 199 199 199 199 199 200 200 200 200 200 FY 01 FY 02 FY 03 FY 04 Year

Fiscal Year (TI-XX) HPCMPHPCMP SystemsSystems (MSRCs)(MSRCs)20042004 HPECHPEC ConferenceConference

HPC Center System Processors Army Research IBM P3 1,280 PEs Laboratory (ARL) SGI Origin 3800 256 PEs 512 PEs IBM P4 768 PEs 128 PEs Networx Cluster 256 PEs MajorMajor LNX1 Cluster 2,100 PEs IBM Cluster 2,372 PEs SGI Cluster 256 PEs SharedShared Aeronautical SC-45 836 PEs Systems Center IBM P3 528 PEs ResourceResource (ASC) COMPAQ SC-40 64 PEs SGI Origin 3900 2,048 PEs SGI Origin 3900 128 PEs CentersCenters IBM P4 32 PEs

Engineer Research Compaq SC-40 512 PEs and Development Compaq SC-45 512 PEs FYFY 0101 andand earlierearlier Center (ERDC) SGI Origin 3800 512 PEs T3E 1,888 PEs FYFY 0202 SGI Origin 3900 1,024 PEs FYFY 0303 64 PEs FYFY 0404 Naval IBM P4 1,408 PEs Oceanographic SV1 64 PEs Office (NAVO) IBM P4 3,456 PEs

20042004 HPECHPEC ConferenceConference HPCMPHPCMP SystemsSystems (ADCs)(ADCs)

HPC Center System Processors Army High 1,088 PEs Performa nce Cray X1, LC 128 PEs FYFY 0101 andand earlierearlier Computing Center 64 PEs FYFY 0202 (AHPCRC) FYFY 0303 Arctic Regio n Cray T3E 272 PEs FYFY 0404 upgradesupgrades Supercomputing Cray SV1 32 PEs Center (ARSC) IBM P3 200 PEs IBM Regatta P4 800 PEs Cray X1 128 PEs WhyWhy isis thethe datedate Maui High Performance IBM P3 (2 ) 736/320 PEs important?important? Computing Center IBM Netfinity 512 PEs Generally we see (MHPCC) Cluster 320 PEs Generally we see IBM P4 price-performanceprice-performance gainsgains ofof ~~ 1.681.68 Space & Mi ssile SGI Origins 1,200 PEs (e.g., 2001 = 1 Defense Command Cray SV-1 32 PEs (e.g., 2001 = 1 (SMDC) W.S. Cluster 64 PEs 20022002 == 1.681.68 xx IBM e1300 Clust er 256 PEs 20032003 == 2.822.82 xx Linux Cluster 256 PEs 20042004 == 4.744.74 xx IBM Regatta P4 32 PEs

20042004 HPECHPEC ConferenceConference HPCMPHPCMP DedicatedDedicated DistributedDistributed CentersCenters

Description Location System (Processors/Memory) HP Superdome 32 PEs Arnold Engineering IBM Cluster 16 PEs Development Center (AEDC) IBM Regatta P4 64 PEs Pentium Cluster 8 PEs

Air Force Researh Sky HPC-1 384 PEs Laboratory, Information

Directorate (AFRL/IF)

Air Force Weather Agency IBM Regatta P4 96 PEs (AFWA) Heterogeneous HPC 96 PEs

Powerwulf 32 PEs Aberdeen Test Center (ATC) Powerwulf 32 PEs

Fleet Numerical Meterology SGI Origin3900 256 PEs and Oceanography Center IBM Regatta P4 96 PEs (FNMOC)

Joint Forces Command Xeon Cluster 256 PEs (JFCOM)

FYFY 0404 newnew sysystemsstems and/orand/or upgradesupgrades

AsAs of:of: AprilApril 20042004 20042004 HPECHPEC ConferenceConference HPCMPHPCMP DedicatedDedicated DistributedDistributed CentersCenters

Description Location System (Processors/Memory) Naval Air Warfare Center, Aircraft SGI Origin 2000 30 PEs Division (NAWCAD) SGI Origin 3900 64 PEs

Naval Research Laboratory-DC SUN Sunfire 6800 32 PEs (NRL-DC) Cray MTA 40 PEs SGI Altix 128 PEs SGI Origin 3000 128 PEs

Redstone Technical Test Center SGI Origin 3900 28 PEs (RTTC)

Simulations & Analysis Facility SGI Origin 3900 24 PEs (SIMAF) Beowulf Cluster

Space and Naval Warfare Linux Cluster 128 PEs Systems Center-San Diego IBM Regatta P4 128 PEs (SSCSD)

Whites Sands Missile Range Linux Networx 64 PEs (WSMR)

FYFY 0404 newnew sysystemsstems and/orand/or upgradesupgrades

AsAs of:of: AprilApril 20042004 20042004 HPECHPEC ConferenceConference CenterCenter POCPOCPOC’s’’ss Name Org Web URL Contact Information

703-812-8205, Brad Comes HPCMO http://www.hpcmo.hpc.mil [email protected]

ARL 410-278-9195 Tom Kendall http://www.arl.hpc.mil MSRC [email protected] ASC 937-904-5135, Jeff Graham http://www.asc.hpc.mil/ MSRC [email protected]

AFRL http://www.if.afrl.af.mil/tec 315-330-3249, Chris Flynn Rome DC h/facilities/HPC/hpcf.html [email protected]

SSCSD http://www.spawar.navy. 619-553-1592, Dr. Lynn Parnell DC mil/sandiego/ [email protected]

MHPCC 808-874-1604, Maj Kevin Benedict http://www.mhpcc.edu DC [email protected] 20042004 HPECHPEC ConferenceConference DisasterDisaster RecoveryRecovery

RetainRetain third-copythird-copy ofof criticalcritical datadata atat aa hardenedhardened backupbackup sitesite soso usersusers cancan accessaccess theirtheir filesfiles fromfrom anan alternatealternate sitesite inin thethe eventevent ofof disruptiondisruption ofof theirtheir primaryprimary supportsupport sitesite zz Status:Status: 00AllAll MSRCs,MSRCs, MHPCC,MHPCC, andand ARSCARSC willwill havehave ““off-site”off-site” thirdthird-copy-copy backup storage for critical data 00OnOn-going-going initiativeinitiative zz WorkingWorking withwith centerscenters toto documentdocument thethe kindskinds ofof datadata thatthat wouldwould needneed toto bebe recoveredrecovered zz ImplementationImplementation toto beginbegin Q1Q1 FY05FY05 20042004 HPECHPEC ConferenceConference UserUser InterfaceInterface ToolkitToolkit

ProvideProvide anan APIAPI-based-based toolkittoolkit toto thethe useruser ccommunityommunity andand developersdevelopers thatthat facilitatesfacilitates thethe implementationimplementation of webweb-based-based interfaces to HPC

Facilitates Information Integration 20042004 HPECHPEC ConferenceConference BaselineBaseline ConfigurationConfiguration

ImpImpllementement andand SustainSustain aa CommonCommon SetSet ofof CapabilitieCapabilitiess andand FunctionsFunctions AcrossAcross thethe HPCMPHPCMP CentersCenters

Enables Users to Easily Move Between Centers Without the Requirement to Learn and Adapt to Unique Configurations 20042004 HPECHPEC ConferenceConference SoftwareSoftware ApplicationsApplications SupportSupport

HPC PET Partners Applications Institutes

zz LLaastingsting impactimpact onon servicesservices zz TransferTransfer ofof newnew technologiestechnologies fromfrom universitiesuniversities zz HighHigh valuevalue serviceservice programsprograms NDSEGNDSEG zz OnOn-s-siitete supportsupport zz TrainingTraining

HPC Software Software Portfolios Protection zz TightlyTightly integratedintegrated softwaresoftware zz AssureAssure softwaresoftware intendedintended use/useruse/user zz AddressAddress toptop DoDDoD S&TS&T andand zz ProtectProtect ssooftwareftware throughthrough sourcesource T&ET&E pprroblemsoblems insertioninsertion 20042004 HPECHPEC ConferenceConference HPCHPC SoftwareSoftware ApplicationsApplications InstitutesInstitutes andand FocusedFocused PortfoliosPortfolios HPC-SAI MGR z 5–8 HPC Software (Applications) z 55–8–8 HPC Software (Applications) Service InstitutesInstitutes Service ManagementService 00 Management Portfolio MGR HPCMPHPCMP charteredchartered ManagementService 0 ManagementService 0 ServiceService managedmanaged ManagementService Management 00 33–6–6 yearyear durationduration zz EndsEnds withwith TransitionTransition toto Project Team LocalLocal SupportSupport HSAI at/for XXX 00 $0.5$0.5–3M–3M annualannual fundingfunding for:for: PET $8-12M ON-SITES zz 33-12-12 computationalcomputational andand computercomputer scientistsscientists zz SupportSupport developmentdevelopment ofof newnew andand existingexisting codescodes zz AdjustAdjust locallocal businessbusiness practicepractice toto useuse sciencescience-- Computational Projects basedbased modelsmodels && simulationsimulation 0 0 IntegratedIntegrated withwith PETPET $8-12M 20042004 HPECHPEC ConferenceConference 20042004 HPECHPEC ConferenceConference HPCHPC ComputationalComputational FellowshipsFellowships zz PatternedPatterned afterafter successfulsuccessful DOEDOE fellowshipfellowship programprogram zz NationalNational DefenseDefense ScienceScience andand EngineeringEngineering GraduateGraduate FellowshipFellowship ProgramProgram (NDSEG)(NDSEG) chosenchosen asas vehiclevehicle forfor executionexecution ofof fellowshipsfellowships 00 HPCMPHPCMP addedadded asas fellowshipfellowship sponsorsponsor alongalong withwith Army,Army, Navy,Navy, andand AirAir ForceForce 00 ComputerComputer andand computational sciencessciences added as possible discipline zz HPCMPHPCMP isis sponsoringsponsoring 11 fellows for 2004 and similar numbersnumbers eacheach followingfollowing yearyear zz HPCMPHPCMP fellowsfellows areare stronglystrongly encouragedencouraged toto developdevelop closeclose tiesties withwith DoDDoD laboratorieslaboratories oror testtest centers,centers, includingincluding summersummer researchresearch projectsprojects zz UserUser organizationsorganizations havehave respondedresponded toto DUSDDUSD (S&T)(S&T) memomemo withwith fellowshipfellowship POCsPOCs toto selectselect andand interactinteract withwith fellowsfellows 20042004 HPECHPEC ConferenceConference HPCMPHPCMP ResourceResource AllocationAllocation PolicyPolicy CapabilityCapability AllocationsAllocations Goal:Goal: SupportSupport thethe toptop capabilitycapability workwork How:How: zz NewNew TITI-XX-XX resourcesresources generallygenerally areare implementedimplemented forfor aa fewfew monthsmonths beforebefore tthehe endend ofof thethe currentcurrent fiscalfiscal yearyear withoutwithout formalformal allocationallocation zz DedicateDedicate majormajor fractionsfractions ofof largelarge newnew systemssystems toto shortshort-term,-term, massivemassive computationscomputations thatthat generallygenerally cannotcannot bebe addressedaddressed underunder normalnormal sharedshared resourceresource operationsoperations forfor thethe firstfirst 22–3–3 months ofof lifelife zz HPCMPHPCMP issuedissued callcall forfor shortshort-term-term Capability Application Project (CAP) proposals zz CapabilityCapability ApplicationApplication ProjectsProjects willwill bebe implementedimplemented betweenbetween OctoOctoberber andand DecemberDecember onon largelarge newnew systemssystems eacheach yearyear 00 ProposalsProposals areare requiredrequired toto showshow thatthat thethe applicationapplication efficientlyefficiently usedused onon thethe orderorder ofof 1,0001,000 processorsprocessors oror moremore andand wouldwould solvesolve aa veryvery difficudifficult,lt, importantimportant shortshort-term-term computationalcomputational problemproblem 20042004 HPECHPEC ConferenceConference StatusStatus ofof CapabilityCapability ApplicationsApplications ProjectsProjects zz CallCall releasedreleased toto HPCMPHPCMP communitycommunity onon 2222 AprilApril 20042004 withwith responsesresponses sentsent toto HPCMPOHPCMPO byby 11 JuneJune 20042004 002121 proposalsproposals receivedreceived acrossacross allall largelarge CTAsCTAs (CSM,(CSM, CFD,CFD, CCM,CCM, CEA,CEA, andand CCWWO)O) zz CAPsCAPs willwill bebe runrun onon newnew 3,0003,000 processorprocessor Power4+Power4+ atat NAVO,NAVO, 2,1002,100 processorprocessor XeonXeon andand 2,3002,300 processorprocessor OpteronOpteron cluclusstersters atat ARLARL zz CACAPsPs willwill bebe runrun inin twotwo phases:phases: 00ExploratoryExploratory phase designed to testst scalabilityscalability andand efficiencyefficiency ofof applicationapplication codescodes toto signsignificantificant fractionsfractions ofof systemssystems (5(5-15-15 projectsprojects on eacheach system)system) 00ProductionProduction phasephase designed to accomplish significant capability wwoorkrk withwith efficient,efficient, scalablescalable codescodes (1(1-3-3 projectsprojects on each system) zz ProductionProduction phasephase ofof CAPsCAPs willwill bebe runrun afterafter normalnormal acceptanceacceptance testestingting andand pioneerpioneer workwork onon thesethese systemssystems 20042004 HPECHPEC ConferenceConference “““OpenOpenOpen ResearchResearchResearch””” SystemsSystemsSystems zz InIn responseresponse toto customercustomer demand:demand: -- ~ 50% of ChalleChallengenge ProjectProject leadersleaders preferprefer toto useuse anan ““oopenpen researchresearch”” systemsystem zz ““OpenOpen ResearchResearch”” systemssystems concentrateconcentrate onon basicbasic reseresearcharch allowiallowingng betterbetter separatioseparationn ofof sensitivesensitive andand nonn-sensiti-sensitiveve informationinformation 00 minimalminimal backgroundbackground checkcheck facilitatifacilitafacilitatingtingng graduategraduate studentstudent andand foreiforeigngn nationalnational accessaccess zz ForFor FY05FY05 the systems at ARSC will transitiontransition into an ““openopen researchearch”” mode ofof operationoperation 00 EliminateEliminate thethe requirementrequirement forfor ususererss of that system to have NACsNACs 00 CustomersCustomers woulwould have to “certifycertify”” thatthat there work is unclassified nonnon-sensitive-sensitive (e.g.,(e.g., openopen literature,literature, basicbasic research)research) 00 AllAll otherother operatoperationalional andand securitysecurity policiespolicies apply, such as all uuserssers of HPCMP resourcesresources mustmust bebe validvalid DoDDoD ususersers assigneassignedd toto aa DoDDoD computationacomputationall project 00 ConsistentConsistent with Uniform UseUse-Access-Access PolicyPolicy zz TheThe accountaccount application process for ““opeopen researchresearch”” centerscenters oror systemssystems requirerequire certificationcertification byby governmentgovernment prograprogramm managermanager thatthat computationalcomputational wworkork is cleared for openopen literatureliterature publicationpublication 00 ComponentComponent ofof FYFY 20052005 accountaccount requestrequest zz OperationsOperations onon allall otherother systemsystemss remainremain underunder currentcurrent policiespolicies 20042004 HPECHPEC ConferenceConference OnOnOn-demand--demanddemand (Interactive)(Interactive) SystemsSystems zz "Real"Real-time"-time" community has asked for "guaranteed" or onon-demand-demand serviceservice fromfrom sharedshared resourceresource centerscenters 00RequestRequest isis aimedaimed atat ensuringensuring quiquickck responseresponse timetime fromfrom sharedshared resourceresource whewhenn systemsystem isis beingbeing usedused interactivelyinteractively 00ResultsResults neededneeded nownow —— can'tcan't waitwait zz CurrentCurrent policy requires that all Service/AgencyService/Agency work,work, be covered byby anan allocationallocation 00Note:Note: "On"On-demand"-demand" system will have lowerlower utilizationutilization butbut fastfast turnturn aroundaround 00ServiceService "valuation"valuation"" ofof thisthis serviceservice demonstrateddemonstrated byby FY05FY05 allocationsallocations — needneed sufficient allocationallocation to dedicate a system toto thisthis modemode ofof supsuppportort zz AnticipatingAnticipating the Services/AgencieServices/Agenciess willwill allocateallocate sufficientsufficient timetime toto dedicatededicate oneone 256256 processorprocessor cluclussterter atat ARLARL 20042004 HPECHPEC ConferenceConference OnOnOn-Demand--DemandDemand ApplicationApplication ------DistributedDistributedDistributed InteractiveInteractive HPCHPC TestbedTestbed zz Goal:Goal: AssessAssess thethe potentialpotential valuevalue andand costcost ofof providingproviding greatergreater interactiveinteractive accessaccess toto HPCHPC resourcesresources toto thethe DoDDoD RDT&ERDT&E communitycommunity andand itsits contractors.contractors. zz Means:Means: ProvideProvide bothboth unclassifiedunclassified andand classifiedclassified distributeddistributed HPCHPC resourcesresources toto thethe DoDDoD HPCHPC communitycommunity inin FY05FY05 forfor interactiveinteractive experimentationexperimentation exploringexploring newnew applicationsapplications andand systemsystem configurationsconfigurations 20042004 HPECHPEC ConferenceConference DistributedDistributed InteractiveInteractive HPCHPC TestbedTestbed

Legend Defense Research and Remote Users Engineering Networks Networked HPC’s Unclassified System in Black Classified Systems in Red AFRL Coyote MHPCC Wile Koa Cluster Koa ASC ARL Cluster Mach 2 Powell Glenn

SSCSD Seahawk Seafarer

9Distributed HPC’s 9Accessed by authorized users anywhere on the DREN and Internet 9Interactive and time critical problems DistributedDistributed InteractiveInteractive HPCHPC TestbedTestbed20042004 HPECHPEC ConferenceConference ---- TechnicalTechnical ChallengesChallenges zz LowLow latencylatency support for interactive and realreal-time-time applicationsapplications—proper—proper HPC configuration? zz CohabitationCohabitation ofof interactiveinteractive andand batchbatch jobs?jobs? zz WebWeb-based-based accessaccess toto networknetwork ofof HPCHPC’s’s withwith enhancedenhanced usabilityusability zz ConsistencyConsistency withwith HPCMPHPCMP approvedapproved securesecure environmentenvironment usingusing DRENDREN andand SDRENSDREN zz InformationInformation managementmanagement systemsystem supportingsupporting distributeddistributed HPCHPC applicationsapplications zz DemonstratingDemonstrating newnew C4ISRC4ISR applicationsapplications ofof HPCHPC zz ExpandingExpanding FMSFMS useuse beyondbeyond JointJoint experimentationexperimentation toto includeinclude trainingtraining andand missionmission rehearsalrehearsal ExampleExample OnOn-Demand-Demand Experiment:Experiment: 20042004 HPECHPEC ConferenceConference ----InteractiveInteractive ParallelParallel MATLABMATLAB atat zz Objectives:Objectives: toto provide SIP users with a High Productivity InteractiveInteractive ParallelParallel MATLABMATLAB environmentenvironment (it(it willwill provideprovide thethe useruser-friendly-friendly MATLABMATLAB highhigh-level-level languagelanguage syntaxsyntax plusplus thethe computationalcomputational power of the interactive HPCsHPCs)) zz ToTo allowallow interactiveinteractive experimentsexperiments forfor demandingdemanding SIPSIP problems:problems: problemsproblems thatthat taketake tootoo longlong toto finishfinish onon aa singlesingle Workstation,Workstation, oorr thatthat requirerequire moremore memorymemory thanthan whatwhat isis availableavailable onon aa singlesingle ,computer, oror systemssystems withwith bothboth constrainsconstrains inin whichwhich usersusers’’ researchresearch maymay benefitbenefit byby anan interactiveinteractive modusmodus-operandi.-operandi. zz Approach:Approach: toto uusese MatlabMPIMatlabMPI oror otherother ParallelParallel MATLABMATLAB viableviable approachesapproaches toto deliverdeliver parallelparallel executionexecution butbut keepingkeeping thethe familiafamiliarr MATLABMATLAB interactiveinteractive environmentenvironment zz ItIt maymay serveserve asas aa vehiclevehicle toto collectcollect experimentalexperimental datadata aboutabout productivityproductivity issues: areare SIPSIP users really more productive on such anan InteractiveInteractive HPCHPC MATLABMATLAB platform?platform? (versus(versus thethe traditionaltraditional batchbatch orientedoriented HPCsHPCs)) DIHTDIHT HighHigh PerformancePerformance ComputersComputers 20042004 HPECHPEC ConferenceConference

Site Computer Memory and I/O Online

ARL MSRC Unclass- Powell: 128 node Dual 3.06MHz 2 GB DRAM and 64 GB Est. 10/04 w/batch; Aberdeen, MD Xeon Cluster disk/node, Myrinet & GigEnet/ 4/05 share with batch, 100MB Backplane

ASC MSRC Unclass- Mach2: 24 node Dual 2.66 GHz 4 GB DRAM and 80 GB Est. 10/04 Dayton, OH Xeon, Linux disk/node , dual GigEnet Class-Glenn: 128 node dual Xeon, Linux 4 GB DRAM and local disks Est. Spring/05

AFRL Unclass- Coyote: 26 node Dual 3.06GHz 6 GB DRAM and 400 GB Yes Rome, NY Xeon, Linux disk/node, dual GigEnet Class- Wile:14 node Dual 2.66/3.06 GHz 6 GB DRAM and 200 GB Est. 12/04 Xeon, Linux disk/node, dual GigEnet

SSCSD Unclass- Seahawk: 16 node 1.3GHz 2 GB DRAM and 36 GB Est. 12/04 San Diego, CA Itanium2, Linux disk/node, dual GigEnet Class- Seafarer: 24 node Dual 3.06 GHz 4 GB DRAM and 80 GB Yes (U) til 3/05 disk/node, dual GigEnet

MHPCC Unclass/Class- Koa: 128 node dual Xeon, 4 GB DRAM and 80 GB Yes Maui, HI Linux (system moves between disk/node, shared file system, environments) dual GigEnet KeyKey TechnicalTechnical UsersUsers 20042004 HPECHPEC ConferenceConference

Name Program Contact Information

Dr. Richard Linderman HPC for Information Management 315-330-2208, [email protected]

Dr. Bob Lucas USJFCOM J9 310-448-9449, [email protected] Dr. Stan Ahalt PET- SIP CTP 614-292-9524, [email protected] Dr. Juan Carlos Chaves Interactiveive ParallelParallel MATLABMATLAB 410-278-7519, [email protected]@arl.army.milil Dr. Dave Pratt SBA Force transformations 407-243-3308, [email protected]

Rob Ehret 937-904-9017, [email protected] Grid-based Collaboration Bill McQuay 937-904-9214, [email protected] Dr. John Nehrbass Web enabled HPC 937-904-5139, [email protected] Dr. Keith Bromley Signal Image Processing 619-553-2535, bromley@[email protected] Dr. George Ramseyer Hyperspectral Image Exploitation 315-330-3492, [email protected] Richard Pei Interactiveive Electromagnetics Sim 732-532-0365, [email protected]

Dr. Ed Zelnio 3-D SAR Radar Imagery 937-255-4949 ext.4214, [email protected] John Rooks Swathbuckler SAR Radar Imagery 315-330-2618, [email protected] DepartmentDepartment ofof DefenseDefense HighHigh PerformancePerformance ComputingComputing ModernizationModernization ProgramProgram

HPCMPHPCMP BenchmarkingBenchmarking andand PerformancePerformance ModelingModeling ActivitiesActivities

http://http://www.hpcmo.hpc.milwww.hpcmo.hpc.mil 20042004 HPECHPEC ConferenceConference PerformancePerformance MeasurementMeasurement GoalsGoals zzProvideProvide QuantitativeQuantitative measuresmeasures toto supportsupport selectionselection ofof computerscomputers inin annualannual procurementprocurement processprocess (TI(TI-XX)-XX)

00 DevelopDevelop anan understandingunderstanding ofof ourour LevelLevel 11 keykey applicationapplication codescodes forfor thethe purposepurpose ofof guidingguiding codecode ApplicationApplication CodeCode developersdevelopers andand usersusers towardtoward ProfilingProfiling more efficient applications and more efficient applications and Level 2 Level 3 machinemachine assiassignmentsgnments Level 2 Level 3

0Replace the current application benchmark suite with a judicious choice of synthetic benchmarks that could be used to predict performance of any HPC architecture on the program’s key applications 20042004 HPECHPEC ConferenceConference ResourceResource ManagementManagement —— IntegratedIntegrated RequiremRequirements/Allents/Allocatocation/Utilizationion/Utilization ProcessProcess

RequirementsRequirements ProcessProcess zz BottomsBottoms-u-upp surveysurvey zz IncluIncluddeses ononlyly approvedapproved fundedfunded S&T/T&ES&T/T&E projectsprojects zz RevieReviewweded andand validatedvalidated CapabilityCapability AllocationAllocation ProcessProcess by S&T/T&E executives by S&T/T&E executives zz 75%75% SeServicrvice/Agency,e/Agency, 2525%% CapacityCapacityDoDDoD Allocation AllocationCChhaallenllenggee ProjectsProjects ProcessPrProcessocess zz ServicServices/Agencieses/Agencies decidedecide Operations Decisions Re z 75% Service/Agency, 25% In qu z 75% Service/Agency, 25% itia ir allalloocationcation resresourcourceses forfor eaceachh l r em DoDDoD CChhaallenllenggee ProjectsProjects eq en projectproject Acquisition Decisions u ts z Services/Agencies decide est d z Services/Agencies decide fo ata zz ReReconcileconcile capacitycapacity withwith r a allalloocationcation resresourcourceses forfor eaceachh llo rereqquireuiremementsnts (fi(first-orst-orrderder ca projectproject tio prioritizprioritizaatiotionn)) n zz ReReconcileconcile capacitycapacity withwith rereqquireuiremementsnts (fi(first-orst-orrderder prioritizprioritizaatiotionn)) F eed bac qua k to UtilizUtilizaationtion TrackTrackiingng ntif hel y re p quir U zz TrackTrack utiliutilizationzation byby em tiliz UserUser FeedbackFeedback ents atio projectproject n fe zz DirDireecctt ffeeedbackedback fromfrom PIPI andand o edba vers ck f z Monitor turnaround indivindiviiduaduall uuserssers ight or m z Monitor turnaround individual users and anag furt eme timetime forfor timelytimely executionexecution zz SummarySummary reportreport sentsent toto eacheach her nt alloc HPCHPC CeCenternter ation zz IssIssueue adaddressedressed anand resolvedresolved zz UserUser satisfactionsatisfaction imimppaactscts rereqquuiirremeementsnts,, alalllooccaatitioonn,, andand utilizautilizattioionn stastattiissticstics 20042004 HPECHPEC ConferenceConference TechnologyTechnology InsertionInsertion (TI)(TI) FlowFlow ChartChart

Requirements Update

Update Update Selection Update Benchmarks Issue Criteria Acquisition call to HPC Plan Benchmark Applications Synthetics Performance Usability vendors and Price/ Performance

Invite Evaluate Vendors solution set results and prepare bids Vendors bids and build prepare bids including guaranteed possible benchmark benchmark solution sets performance results

Evaluate results and negotiate System(s) Benchmark System(s) final deal Delivered Tests Accepted 20042004 HPECHPEC ConferenceConference TypesTypes ofof BenchmarkBenchmark CodesCodes zz SyntheticSynthetic codescodes 00BasicBasic hardwarehardware andand systemsystem performanceperformance teststests 00MeantMeant toto determinedetermine expectedexpected futurefuture performanceperformance 00Scalable,Scalable, quantitativequantitative syntheticsynthetic teststests willwill bebe usedused forfor scoringscoring andand othersothers willwill bebe usedused asas systemsystem performanceperformance checkschecks byby UsabilityUsability TeamTeam zz ApplicationApplication codescodes 00ActualActual applicationapplication codescodes asas determineddetermined byby requirementsrequirements andand usageusage 00MeantMeant toto indicateindicate currentcurrent performanceperformance 20042004 HPECHPEC ConferenceConference PercentagePercentage ofof UnclassifiedUnclassified NonNonNon-Real-Time--RealReal--TimeTime Requirements,Requirements, Usage,Usage, andand AllocationsAllocations

Average Usage Allocation (25% FY 2004 Req, 25% Requirements Percentage Percentage FY 2003 Usage, 50% FY Percentage FY 2002 FY 2003 2004 Alloc) CTA FY [2002] (2003) {2004} {2003} {2004} FY [2002] (2003) {2004} CFD [35.5%] (36.9%) {38.6%} 48.3% {37.2%} 40.7% {44.4%} [43.3%] (41.6%) {41.2%} CCM [15.5%] (18.6%) {16.2%} 16.4% {21.2%} 14.2% {12.6%} [14.2%] (15.9%] {15.7%} CWO [21.9%] (19.2%) {20.8%} 21.3% {23.1%} 21.9% {17.6%} [23.3%] (21.1%) {19.8%} CEA [4.1%] (4.0%) {4.8%} 5.1% {4.8%} 8.2% {6.6%} [4.9%] (6.4%) {5.7%} CSM [11.4%] (11.8%) {11.7%} 3.5% {7.5%} 9.6% {11.0%} [8.3%] (8.6%) {10.3%} EQM [3.0%] (3.2%) {2.1%} 0.6% {1.6%} 4.0% {3.1%} [2.3%] (3.0%) {2.4%} SIP [1.0%] (1.4%) {1.4%} 1.2% {1.1%} 0.2% {0.4%} [0.4%] (0.7%) {0.8%} CEN [0.5%] (0.4%) {0.6%} 1.3% {1.2%} 0.1% {1.2%} [1.4%] (0.5%) {1.1%} IMT [2.9%] (0.8%) {0.8%} 2.1% {0.7%} 0.7% {1.9%} [0.9%] (1.1%) {1.3%} Other [1.3%] (1.2%) {0.2%} 0.1% {0.8%} 0.2% {0.7%} [0.4%] (0.4%) {0.6%} FMS [2.9%] (2.6%) {2.9%} 0.2% {0.8%} 0.2% {0.4%} [0.7%] (0.8%) {1.1%}

CSM 20042004 HPECHPEC ConferenceConference TITITI-05--0505 ApplicationApplication BenchmarkBenchmark CodesCodes zz AeroAero –– AeroelasticityAeroelasticity CFDCFD codecode (single(single testtest case)case) (,(Fortran, serialserial vector,vector, 15,00015,000 lineslines ofof code)code) zz AVUSAVUS (Cobalt(Cobalt-60)-60) –– TurbulentTurbulent flowflow CFDCFD codecode (Fortran,(Fortran, MPI,MPI, 19,00019,000 lineslines ofof code)code) zz GAMESSGAMESS –– QuantumQuantum chemistrychemistry codecode (Fortran,(Fortran, MPI,MPI, 330,000330,000 lineslines ofof code)code) zz HYCOMHYCOM –– OceanOcean circulationcirculation modelingmodeling codecode (Fortran,(Fortran, MPI,MPI, 31,00031,000 lineslines ofof code)code) zz OOCoreOOCore –– OutOut-of-core-of-core solversolver (Fortran,(Fortran, MPI,MPI, 39,00039,000 lineslines ofof code)code) zz RFCTH2RFCTH2 –– ShockShock physicsphysics codecode (~43%(~43% Fortran/~57%Fortran/~57% C,C, MPI,MPI, 436,000436,000 lineslines ofof code)code) zz WRFWRF –– MultiMulti-Agency-Agency mesoscale atmosphericatmospheric modelingmodeling codecode (single(single testtest case)case) (Fortran(Fortran andand C,C, MPI,MPI, 100,000100,000 lineslines ofof code)code) zz OverflowOverflow-2-2 –– CFDCFD code originally developeddeveloped byby NASANASA (Fortran(Fortran 90,90, MPI,MPI, 83,00083,000 lineslines ofof code)code) 20042004 HPECHPEC ConferenceConference TITITI-04--0404 BenchmarkBenchmark WeightsWeights

CTA Benchmark Size Unclassified % Classified % CSM RF-CTH Standard a% A% CSM+CFD RF-CTH Large b% B% CFD Cobalt60 Standard c% C% CFD Cobalt60 Large d% D% CFD Aero Standard e% E% CEA+SIP OOCore Standard f% F% CEA+SIP OOCore Large g% G% CCM+CEN GAMESS Standard h% H% CCM+CEN GAMESS Large i% I% CCM NAMD Standard j% J% CCM NAMD Large k% K% CWO HYCOM Standard l% L% CWO HYCOM Large m% M% Total 100.00% 100.00% 20042004 HPECHPEC ConferenceConference EmphasisEmphasis onon PerformancePerformance zz EstablishEstablish aa DoDDoD standardstandard benchmarkbenchmark timetime forfor eacheach applicationapplication benchmarkbenchmark casecase 00NAVONAVO IBMIBM RegattaRegatta P4P4 (Marcellus)(Marcellus) chosen as standard DoD systemsystem for TITI-04-04 (Initially(Initially IBIBMM SSP3P3 –– HABU)HABU) zz BenchmarkBenchmark timingstimings (at(at leastleast threethree onon eacheach testtest case)case) areare requestedrequested forfor systemssystems thatthat meetmeet oror beatbeat thethe DoDDoD standardstandard benchmarkbenchmark timestimes byby atat leastleast aa factorfactor ofof twotwo (preferably(preferably upup toto four)four) zz BenchmarkBenchmark timingstimings maymay bebe extrapolatedextrapolated providedprovided theythey areare guaranteed,guaranteed, butbut atat leastleast oneone actualactual timingtiming onon thethe offeredoffered oror closelyclosely relatedrelated systemsystem mustmust bebe providedprovided 20042004 HPECHPEC ConferenceConference CTHCTH StandardStandard NAVONAVO IBMIBM SPSP P3P3 —— 128812881288 ProcessorsProcessors “Slope” y = 4.57590E-05x7.15387E-01 “Curvature” x = Number of Processors 2 y = 1/Time R = 9.94381E-01 “Goodness of Fit” 0.001 0.0009 0.0008 0.0007 0.0006 me i 0.0005 T

1/ 0.0004 0.0003 0.0002 0.0001 0 0 10203040506070 Number of Processors 20042004 HPECHPEC ConferenceConference HPCMPHPCMP SystemSystem PerformancePerformance (Unclassified)(Unclassified) ~40 5.00 4.50 FY 2003 FY 2004 s 4.00 nt e 3.50 val n = number of application test cases not included 3.00 (out of 13 total) 1

bu Equi 2.50 a 3 2.00 ed H z i 1.50 4 al 1 m 1.00 9 Nor 5 0.50 0.00 Cray T3E IBM P3 SGI IBM P4 HP SC40 HP SC45 Cray X1 SGI O3800 O3900 System HowHow thethe OptimizerOptimizer Works:Works: 20042004 HPECHPEC ConferenceConference ProblemProblem DescriptionDescription KNOWNKNOWN UNKNOWNUNKNOWN Application Score Optimal Workload Prices Matrix Quantity Distribution Application Test Case Codes Set Matrix $ SSSSSSSS Application Test Case Codes $ SSSSSSSS # %%%%%%%% $ SSSSSSSS %%%%%%%% SSSSSSSS # $ # %%%%%%%% Machines Machines $ SSSSSSSS # %%%%%%%% $ SSSSSSSS Machines # Machines %%%%%%%% $ SSSSSSSS # %%%%%%%% Budget Overall Desired # %%%%%%%% Limits Workload Distribution $ %%%%%%%% OptimizeOptimize TotalTotal Price/PerformancePrice/Performance $ Application Test Case Codes 20042004 HPECHPEC ConferenceConference PricePrice PerformancePerformance BasedBased SolutionsSolutions

Total # System Proc Opt # 1 Opt # 2 Opt # 3 Opt # 4

A641100 B 188 0 2 3 0 C 128 0 0 0 4 C 256 0 2 4 0 D 256 15 0 0 12 D 512 0 4 1 1 E 256 1 1 3 0 Performance / Life Cycle 3.03 3.02 2.97 2.95

TheThe optimizer optimizer produces produces a a list list of of system system solutions solutions in in rank rank orderorder based based upon upon Performance Performance / /Life Life Cycle Cycle Cost Cost 20042004 HPECHPEC ConferenceConference CapturingCapturing TrueTrue PerformancePerformance BenchmarksBenchmarks CapacityCapacity inin HabuHabu EquivalentsEquivalents

10 8 Top 500 or 6 Top 500 or 4 PeakPeak GFlopsGFlops 2 isis notnot aa 0 MeasureMeasure A HPCRC A RL A RSC A SC ERDC MHPCC SMDC NA V O ofof RealReal Large Centers PerformancePerformance

CapacityCapacity inin PeakPeak GFlopsGFlops--yyeaearsrs 10,000 8,000 6,000 4,000 AtAt the the end end 2,000 ofof TI-03 TI-03 0 AHPCRC ARL ARSC ASC ERDC MHPCC SMDC NAVO Large Centers RequirementRequirement TrendsTrends 20042004 HPECHPEC ConferenceConference

10,000,000 1997 1998 1999 1,000,000 2000 s 2001 ) nt s e

r 2002 m Y - e 100,000 r F 2003 (G qui

e Trend 1997 R Trend 1998 10,000 Trend 1999 Trend 2000 1,000 Trend 2001 1996 1998 2000 2002 2004 2006 2008 2010 Trend 2002 Trend 2003 Fiscal Year The slope of this semi-log plot for the entire set of data equates to a constant factor of (1.76+0.26), although the slopes for the last two years have been 1.42 and 1.48, respectively. 20042004 HPECHPEC ConferenceConference SupercomputerSupercomputer PricePrice-Performance-Performance TrendsTrends

12 HPCHPC PricePrice-Performance-Performance TrendsTrends 10

8 1.95 6 1.68 4 1.58 Performance / $ 2 1.2

0 2001 2002 2003 2004 FLOPS/$ HABUs/$ Moore's Law CBO's TFP Expon. (CBO's TFP) Expon. (Moore's Law) Expon. (HABUs/$) Expon. (FLOPS/$) DepartmentDepartment ofof DefenseDefense HighHigh PerformancePerformance ComputingComputing ModernizationModernization ProgramProgram

HPCMPHPCMP BenchmarkingBenchmarking andand PerformancePerformance ModelingModeling

ChallengesChallenges && OpportunitiesOpportunities

http://http://www.hpcmo.hpc.milwww.hpcmo.hpc.mil 20042004 HPECHPEC ConferenceConference BenchmarksBenchmarks TodayToday TomorrowTomorrow

DedicatedDedicated ApplicationsApplications zz 80%80% weightweight SyntheticSynthetic BenchmarksBenchmarks zz RealReal codescodes zz 100%100% weightweight zz RepresentativeRepresentative datadata setssets zz CoordinatedCoordinated toto applicationapplication “signature”“signature” SyntheticSynthetic BenchmarksBenchmarks zz PerformancePerformance onon realreal codescodes zz 20%20% weightweight accuratelyaccurately predictedpredicted fromfrom zz FutureFuture looklook syntheticsynthetic benchmarkbenchmark resultsresults zz FocusFocus onon keykey machinemachine featuresfeatures zz SupportedSupported byby genuinegenuine “signature”“signature” databasesdatabases

NextNext 1–21–2 yearsyears keykey ——mmustust proveprove thatthat syntheticssynthetics benchmarksbenchmarks andand applicationapplication “signatures”“signatures”ccanan bebe coordinatedcoordinated 20042004 HPECHPEC ConferenceConference

HowHow ---- ApplicationApplication CodeCode ProfilingProfiling PlanPlan zz BeganBegan atat behestbehest ofof HPCHPC UserUser FoForumrum inin partnershippartnership withwith NSANSA zz HasHas evolvedevolved toto multimulti-year-year plan ---- howhow keykey applicationion codescodes perform on HPHPCC systemssystems 00 MaximizingMaximizing useuse ofof currentcurrent HPHPCC resourcesresources 00 PredictingPredicting performanceperformance ofof futurefuture HPHPCC resourcesresources zz PerformersPerformers includeinclude 00 ProgrammingProgramming EnEnvironmentvironment aandnd TrainingTraining ((PPET)ET) partnerspartners 00 PerformancePerformance ModelingModeling andand CharCharacteriacterizationzation LLaaboratoryboratory (PMaCPMaC)) aatt SDSDSCSC 00 ComputationalComputational ScienceScience andand EnEngineeringgineering GroupGroup atat EERDCRDC 00 Instrumental,Instrumental, Inc.Inc. zz ResearchResearch andand productionproduction activitiesactivities includeinclude 00 ProfilingProfiling keykey DoDDoD applicationapplication cocodesdes atat severalseveral differentdifferent levelslevels 00 ChCharactaracterizingerizing HPCHPC syssysttemsems withwith aa set ofof systemsystem probesprobes (syntheti(syntheticc benchmarks)benchmarks) 00 PredictingPredicting HPHPCC systemsystem performanperformancece bbaasedsed onon appappllicationication profilesprofiles 00 DeterminingDetermining a minimal set of HPC system attributes necessary to model performance 00 ConstruConstructingcting thethe appropriateappropriate setset ofof synthetsynthetiicc benchmarksbenchmarks toto accuaccuratelyrately modelmodel thethe HPCMPHPCMP computationalcomputational workloadworkload toto useuse inin systemsystem acquisitacquisitiionsons 20042004 HPECHPEC ConferenceConference SupportSupport forfor TITITI-05--0505 (Scope(Scope andand Schedule)Schedule) zz LevelLevel 33 applicationapplication codecode profilingprofiling 00 EightEight applicationapplication codes –– 1414 uniqueunique testtest casescases 00 EachEach testtest casecase toto bebe runrun atat 33 differentdifferent processorprocessor countscounts zz PredictionsPredictions forfor existingexisting systemssystems 00 2121 systemssystems atat 77 centerscenters (some(some overlapoverlap possiblepossible inin predictions)predictions) 00 BenchmarkingBenchmarking POCsPOCs identifiedidentified forfor eacheach centercenter 00 Goal:Goal: benchmarkingbenchmarking resultsresults andand predictionspredictions completecomplete byby DecDec 20042004 zz PredictionsPredictions forfor offeredoffered systemssystems 00 Goal:Goal: benchmarkingbenchmarking resultsresults finalizedfinalized byby 1919 NovemberNovember 2004;2004; allall predictionspredictions completedcompleted byby 3131 DecemberDecember 20042004 zz SensitivitySensitivity AnalysisAnalysis 00 Goal:Goal: DetermineDetermine howhow accurateaccurate aa predictionprediction dodo wewe need.need. 20042004 HPECHPEC ConferenceConference ShouldShould WeWe DoDo UncertaintyUncertainty Analysis?Analysis? 20042004 HPECHPEC ConferenceConference PerformancePerformance PredictionPrediction UncertaintyUncertainty AnalysisAnalysis zz OveralOveralll goal:goal: UnderstandUnderstand andand aaccuccuratelyrately estimateestimate uncertaintiesuncertainties inin performanceperformance predictionspredictions zz DetermineDetermine functional form of performance prediction equations anandd developdevelop uncertaintyuncertainty equationequation zz DetermineDetermine uncertaintiesuncertainties inin underlyiunderlyingng measuredmeasured valuesvalues fromfrom systesystemm probesprobes andand aapplicationpplication profilingprofiling andand useuse uncertaintyuncertainty equationequation toto estimateestimate uncertaintiuncertaintieess zz CompareCompare resultsresults ofof performanceperformance ppredredictioniction toto measuredmeasured timings anandd uncertaintiesuncertainties ofof thesethese resultsresults toto predictedpredicted uncertaintiesuncertainties zz AssessAssess uncertaintiesuncertainties inin measuredmeasured timingstimings andand determinedetermine whetherwhether acceptableacceptable agreementagreement isis obtainedobtained zz EventualEventual goalgoal:: propagatepropagate uncertaiuncertaintntiesies in performance predictiopredictionn toto determinedetermine uncertaintiesuncertainties inin acquisitionacquisition scoringscoring 20042004 HPECHPEC ConferenceConference PerformancePerformance ModelingModeling UncertaintyUncertainty AnalysisAnalysis

zz Assumption:Assumption: UncertaintiesUncertainties inin measuredmeasured performanceperformance valuesvalues cancan bebe treatedtreated asas uncertaintiesuncertainties inin measurementsmeasurements ofof physicalphysical quantitiesquantities zz ForFor small,small, randomrandom uncertaintiesuncertainties inin measuredmeasured valuesvalues x,x, y,y, z,z, ……,, thethe uncertainty in a calculated function q (x, y, z ……)) cancan bebe expressedexpressed as:as:

22 ⎛⎞∂∂qq⎛⎞ δδqx=+⎜⎟L+⎜δz⎟ ⎝⎠∂∂xz⎝⎠ zz SystematicSystematic errorserrors needneed carefulcareful considerationconsideration sincesince theythey cannotcannot bebe calculatedcalculated analyticallyanalytically 20042004 HPECHPEC ConferenceConference PropagationPropagation ofof UncertaintiesUncertainties inin BenchmarkingBenchmarking andand PerformancePerformance ModelingModeling 1 Benchmark T Benchmark PowPoweerr LawLaw Benchmark Times Performance Scores LeastLeast SquSquaaresres FitFit σ δT δ P,σ P S OptimizerOptimizer

Total Price/Performance Price/Performance Price/Performance AveraginAveragingg Rank Ordering of Performance for for Solution Set Solution Sets Solution Set overover spansspans ofof SolutiSolutionon SetsSets σ σ TS σ $ % TS 20042004 HPECHPEC ConferenceConference UU (EXIST+LC)(EXIST+LC) ArchitectureArchitectureArchitecture %% SelectionSelection byby ProcessorProcessor QuantityQuantity forfor VaryingVarying SpansSpans (TI(TI(TI-04)--04)04) 60.0% 1% Span Top 10,000 50.0%

40.0% n o i ect l 30.0% e S % 20.0%

10.0%

0.0% System System System System System System System System System System A B C D E F G H I J Architecture 20042004 HPECHPEC ConferenceConference PerformancePerformance MeasurementMeasurement ––– ClosingClosingClosing ThoughtsThoughts zz ClearlyClearly identifyidentify youryour goalsgoals 00MaximizeMaximize thethe amountamount ofof workwork givengiven fixedfixed $$ andand time.time. 00AlternativeAlternative goals:goals: powerpower consumption,consumption, weight,weight, volumevolume zz DefineDefine WorkWork FlowFlow 00ProductionProduction (run)(run) timetime 00AlternativeAlternative goals:goals: developmentdevelopment time,time, problemproblem setset-up-up time,time, resultresult analysisanalysis timetime zz ValidateValidate MeasuresMeasures 00UnderstandUnderstand thethe errorerror boundsbounds zz DonDon’t’t rely on ““Marketing”Marketing” specifications!specifications! 20042004 HPECHPEC ConferenceConference