From Computational Science to Science Discovery

From Computational Science to Science Discovery

FromComputationalSciencetoScienceDiscovery:The NextComputingLandscape GiladShainer,BrianSparks,ScotSchultz,EricLantz,WilliamLiu,TongLiu,GoldiMisra HPCAdvisoryCouncil {Gilad,Brian,Scot,Eric,William,Tong,[email protected]} Computationalscienceisthefieldofstudyconcernedwithconstructingmathematicalmodelsand numericaltechniquesthatrepresentscientific,socialscientificorengineeringproblemsandemploying thesemodelsoncomputers,orclustersofcomputerstoanalyze,exploreorsolvethesemodels. Numericalsimulationenablesthestudyofcomplexphenomenathatwouldbetooexpensiveor dangeroustostudybydirectexperimentation.Thequestforeverhigherlevelsofdetailandrealismin suchsimulationsrequiresenormouscomputationalcapacity,andhasprovidedtheimpetusfor breakthroughsincomputeralgorithmsandarchitectures.Duetotheseadvances,computational scientistsandengineerscannowsolvelargeͲscaleproblemsthatwereoncethoughtintractableby creatingtherelatedmodelsandsimulatethemviahighͲperformancecomputeclustersor supercomputers.Simulationisbeingusedasanintegralpartofthemanufacturing,designanddecisionͲ makingprocesses,andasafundamentaltoolforscientificresearch.ProblemswherehighͲperformance simulationplayapivotalroleincludeforexampleweatherandclimateprediction,nuclearandenergy research,simulationanddesignofvehiclesandaircrafts,electronicdesignautomation,astrophysics, quantummechanics,biology,computationalchemistryandmore. Computationalscienceiscommonlyconsideredthethirdmodeofscience,wherethepreviousmodesor paradigmswereexperimentation/observationandtheory.Inthepast,sciencewasperformedby observingevidenceofnaturalorsocialphenomena,recordingmeasurabledatarelatedtothe observations,andanalyzingthisinformationtoconstructtheoreticalexplanationsofhowthingswork. WiththeintroductionofhighͲperformancesupercomputers,themethodsofscientificresearchcould includemathematicalmodelsandsimulationofphenomenonthataretooexpensiveorbeyondour experiment'sreach.Inturn,wecanforecastweatherconditionssooner,explorealternativeenergy sources,buildsafervehiclesandpackageconsumedgoodsinamoreeconomicalway.Inorderto performthosenumericalsimulationeffectivelyandproductively,costͲeffectiveorcommoditybased supercomputersarchitectureswerecreatedhighͲperformanceclusteringofcomputers. HighͲperformancecomputing(HPC)clustersarescalableperformancecomputesolutionsbasedon industryͲstandardhardwareconnectedbyaprivatesystemhighͲspeednetwork.Themainbenefitsof clustersareaffordability,flexibility,availability,highͲperformanceandscalability.Aclusterusesthe aggregatedpowerofcomputeservernodestoformahighͲperformancesolutionforparallel applications.Whenmorecomputepowerisneeded,itcanbesimplyachievedbyaddingmoreserver nodestothecluster.TheLosAlamosNationalLab(US)Roadrunnercluster(figure1)wasthefirst systemtoprovidePetaflop(athousandtrillionCPUfloatingpointoperationsorinstructionspersecond) performanceforscientificsimulations(nationalnuclearweapons,astronomy,humangenomescience andclimatechange).RoadrunnerwasbuiltusingIBMCellCPUsandAMDOpteronCPUsboards,and MellanoxInfiniBandtoconnectbetweenthem.OakRidgeNationalLab(US)Spidersystemisoneof theworldslargestandfasteststorageclusterfilesystemthatincludesthousandsofconnections(based onInfiniBandinterconnect)andover10.7PetaBytestoragecapacitytoservethehighͲperformance systemsatthelab.TheNationalUniversityofDefenseTechnology(China)TianHesystem(figure2)is thefirstPetascalesysteminAsia.ThesystemisusingthousandsofIntelCPUandATIGPUs,all connectedviaMellanoxInfiniBandnetworking. Figure1LosAlamosNationalLabRoadrunnersystemstheworldsfirstPetaflopsystem Figure2NationalUniversityofDefenseTechnologyTianHesystem WiththecreationofbiggerandfasterhighͲperformancecomputingsystemsforscientificand engineeringsimulations,newgenerationsofsensorͲcomputerapplianceshavebeencreatedforspecific applications.Oneexampleisthe,theAustralianSquareKilometreArrayPathfinder(ASKAP),anarrayof radiotelescopesthatwillcompriseof36antennaseach12mindiameter,capableofhighdynamicrange imagingandusingwideͲfieldͲofͲviewphasedarrayfeeds.ASKAPwillbeatelescopethatcancapture radioimageswithunprecedentedsensitivityoverlargeareasofsky.WithalargeinstantaneousfieldͲofͲ viewASKAPwillbeabletosurveythewholeskyvastlyfasterthanispossiblewithexistingradio telescopes. Figure3IllustrationoftheAustralianSquareKilometreArrayPathfinder PetaflopSupercomputersCreateExaͲfloodofData Theeverincreasingdemandsforcomputationalpowerdeliveredbytheeverincreasingsupercomputer capabilityandcapacityproduceanoverwhelmingflowofdata.InoneweektheAustralianSquare KilometreArrayPathfinderwillgeneratemoreinFormationthaniscurrentlycontainedonthewhole WorldWideWeb,andinonemonthitwillgeneratemoreinformationthaniscontainedintheworld's academiclibraries.APetaflopsupercomputerequals150,000computationsforeveryhumanonthe planetpersecond,andasingledaysusageworldTOP500supercomputers(accordingtotheNovember 2009list)isequalto240billionpeoplearmedwithcalculatorsfornearly50years. Withtheincreasingrampofdatagenerationfromscientificandengineeringsimulationsand observationtargetedsupercomputers,futuretechnologydevelopmentshouldbefocusedoncreating scalablehighͲperformanceclustersofcomputersthatcanmanageandprocessallofthisdata.The futurepremiseofcomputeinfrastructuresshouldbeaimedintobuildingorprovidingtoolsandsystems forsciencediscovery,inwhichallofthecomputationalscienceliteratureanddatabasescanbe availableonlineandbesharedbyscientists,researchersandengineersaroundtheglobe.Distributed sciencecanbeseenasthefourthmodeorparadigmwheresciencebecomescentralizedthroughout centralizationofcomputingfacilities,andthosecomputingfacilitiesarethentargetedintomanaging, visualizingandanalyzingthedataflood.Computationalsciencedrivesthevastcreationofdatawhichis beyondourcapabilitiestoanalyzeandunderstand,andtheroleofsciencediscoverywillextendto createthetoolstoextractthefuturesciencediscoveriesoutofthedataflood. Furthermore,inmanyscientiFicfieldsofstudies,theinstrumentsareextremelyexpensive,andassuch, thedatamustbeshared.WiththisdataexplosionandashighͲperformancesystemsbecomea commodityinfrastructure,thepressuretosharescientificdataisincreasing.Thatresonateswellwith theemergingcomputingtrendknownasthecloudorcloudcomputing.Whileforthemomentcloud computingappearstobeacosteffectivealternativeforITspending,ortheshiftofenterpriseITcenters fromcapitalexpensetooperationalexpense,researchinstituteshavestartedexploringhowcloud computingcancreatethedesiredcomputecentralizationandanenvironmentforresearcherstoshare andcrunchthefloodofdata.OneexampleisthenewsystemattheNationalEnergyResearchScientific ComputingCenter(US),namedMagellan.WhileMagellansinitialtargetistoprovideatoolfor computationalscienceinacloudenvironment,itcanbeeasilymodifiedtobecomeacenterfordata processingaccessedbymanyresearchersandscientists. CentralizedDataCrunchingComputeEnvironmentThroughoutCloudComputing Theconceptofcomputinginacloudistypicallyreferredasahostedcomputationalenvironment (couldbelocalorremote)thatcanprovideelasticcomputeandstorageservicesforusersperdemand. Thereforethecurrentusagemodelofcloudenvironmentsisaimedforcomputationalscience.Future cloudscanbeservedasenvironmentsfordistributedsciencetoallowresearchersandengineersto sharetheirdatawiththeirpeersaroundtheglobeandallowexpensiveachievedresultstobeutilizedfor moreresearchprojectsandscientificdiscoveries. Toallowtheshifttothefourthmodeofsciencediscoverythosecloudenvironmentswillneednot onlytoprovidecapabilitytosharethedatacreatedbythecomputationalscienceandthevarious observationsresults,butalsotobeabletoprovidecostͲeffectivehighͲperformancecomputing capabilities,similartothatoftodaysleadingsupercomputers,inordertobeabletorapidlyand effectivelyanalyzethedataflood.Moreover,animportantcriteriaofcloudsneedtobefastprovisioning ofthecloudresources,bothcomputeandstorage,inordertoservicemanyusers,manydifferent analysisandbeabletosuspendtasksandbringthembacktolifeinafastmanner.Reliabilityisanother concern,andcloudsneedtobeabletobeselfhealingcloudswherefailingcomponentscanbe replacedbysparesoronͲdemandresourcestoguaranteeconstantaccessandresourceavailability. TheuseofGridsforscientificcomputinghasbecomesuccessfulinthelastfewyearsandmany internationalprojectsledtotheestablishmentofworldͲwideinfrastructuresavailableforcomputational science.TheOpenScienceGridprovidessupportfordataͲintensiveresearchfordifferentdisciplines suchasbiology,chemistry,particlephysics,andgeographicinformationsystems.EnablingGridfor ESciencE(EGEE)isaninitiativefundedbytheEuropeanCommissionthatconnectsmorethan91 institutionsinEurope,Asia,andUnitedStatesofAmerica,toconstructthelargestmultiͲscience computingGridinfrastructureoftheworld.TeraGRIDisanNSFfundedprojectthatprovidesscientists withalargecomputinginfrastructurebuiltontopofresourcesatnineresourceproviderpartnersites.It isusedby4000usersatover200universitiesthatadvanceresearchinmolecularbioscience,ocean science,earthscience,mathematics,neuroscience,designandmanufacturing,andotherdisciplines. WhileGridscanprovideagoodinfrastructureforsharedscienceanddataanalysis,severalissuesmake theGridsproblematictoleadthefourthmodeofsciencelimitedsoftwareflexibility,applications typicallyneedtobepreͲpackaged,nonelasticityandlackofvirtualization.Thosemissingitemscanbe deliveredthroughcloudcomputing. Cloudcomputingaddressesmanyoftheaforementionedproblemsbymeansofvirtualization technologies,whichprovidetheabilitytoscaleupanddownthecomputinginfrastructureaccordingto givenrequirements.ByusingCloudͲbasedtechnologiesscientistscanhaveeasyaccesstolarge distributedinfrastructuresandcompletelycustomizetheirexecutionenvironment.Furthermore, effectiveprovisioningcansupportmanymoreactivitiesandsuspendorbringtolifeactivitiesinan

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    6 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us