GGRIDRID TTECECHHNNOLOOLOGIESGIES
Ramon Nou Castell Feb-2006
1 GGRRIDID TechTechnnolologogiesies
Introducing GRID computing GLOBUS TOOLKIT (GT4) UNICORE Installation Performance Tools for Grid Grid Monitoring CrossGrid Performance enhancements proposals Testing Performance Traces from GT4 UNICORE/GLOBUS DEMO
2 IInntrodtroducucinging GGridrid CCompompuutitingng
Definition – Grid computing uses the resources of many separate computers connected by a network to solve large-scale computation problems. – Anonymous Resources
Important issues Heterogeneous resources Latency issues No exclusive systems Security Issues 3 IInntrodtroducucinging GGridrid CCompompuutitingng
Usuario
Grid Gateway
Servidor
PC Servidor Supercomputador Mac 4 IInntrodtroducucinging GGridrid CCompompuutitingng
Two performance views
Small 4 - J o b D i s t r i b u t i o n
D i f f e r e n t d i s t r i b u t i o n • Increase performance Zoom In modifying per-server internals Large Topology, Routing Algorithms
5 IInntrodtroducucinging GGridrid CCompompuutitingng
● non-commercial GRID software : – GLOBUS ( www.globus.org ) – University of Chicago – University of EdinBurgh – NCSA – Northern Illinois University – Royal Institute of technology, Sweden – Univa Corporation – University of Southern California – DARPA, US.Dept Energy, NSF, NASA, UK e-Science, IBM, Microsoft Research, Cisco Systems Corporation. – UNICORE ( www.unicore.org ) CRAY, Fujitsu Siemens, Hitachi, IBM, NEC, HP, Intel, SGI, T-Systems Fujitsu Laboratories of Europe 6 IInntrodtroducucinging GGridrid CCompompuutitingng
● Commercial Grid Software Avaki (Sybase) Compaq Computer Data Synapse Entropia (Windows Based) IBM Noemix ( Sun ) Parabon (Java based) Platform Computing Sun microsystems (Pay-per-use $1 / cpu-hour) United Devices
7 GGLOBLOBUUSS TTOOLKOOLKIITT
● Several services and layers from GGF-defined protocols – Resource management: ● Grid Resource Allocation & Management Protocol (GRAM) – Information Services: ● Monitoring and Discovery Service (MDS) – Security Services: ● Grid Security Infrastructure (GSI) – Data Movement and Management: ● Global Access to Secondary Storage (GASS) and GridFTP
8 9 GGT4T4 AArchrchitectureitecture ((llaayeyers)rs)
10 GlobuGlobuss TTununiingng
From configuration files : ● Number of runQueues ● Timeout settings ● …
Hard-coded ● ServiceThreads [Similar to HTTPProcessors at Tomcat] ● Priority ● …
11 UUNICORNICOREE
● Uniform Interface to Computer Resources ● Development since 1997 ● Uses an abstract job Object (AJO) – Translated on the target machine ● Full integration with VampirTrace via plugins ● Less software tiers than Globus – But more overhead on Job Preparation...
12 UUNICORNICOREE ararchitecchitecttururee
UNICORE User HTTP Site List GUI
SSL UNICORE Server UNICORE Server
SSL GATEWAY GATEWAY
Network Job Supervisor Network Job Supervisor
TCP/IP TCP/IP
Target System Interface Target System Interface
O.S. O.S. 13 UUNICORNICOREE ararchitecchitecttururee
UNICORE User HTTP Site List GUI
• Vsite (Virtual Site)
– Collection of resources SSL UNICORE Server UNICORE Server Gateway
SSL GATEWAY In front of site firewall, GATEWAY establishes connections
between Gateway to the NJS Network Job Supervisor Network Job Supervisor
User authentication, Secure TCP/IP TCP/IP communication Target System Interface Target System Interface
O.S. O.S.
14 UUNICORNICOREE ararchitecchitecttururee
UNICORE User HTTP Site List GUI
• NJS (Network Job Supervisor) – Oversees the execution of Unicore jobs
for a Vsite. SSL UNICORE Server UNICORE Server – Analyzes AJO, maps userid, file transfers. SSL GATEWAY – Job Status. Creates temp directory for GATEWAY every Job
Network Job Supervisor Network Job Supervisor • TSI (Target System Interface) Interface to the local operating and TCP/IP TCP/IP resource management system. Target System Interface Target System Interface Can use several Schedulers/resource managers like Globus. O.S. O.S.
15 UUNICORNICOREE ttununiningg
● Gateway – gw.max_threads – conn_timeout NJS – njs.tsi_worker_limit = 5 ● Number of TSI processes that will be created – njs.tsi_update_interval = 5000 ● Number of milliseconds to update a job state – threads.Incarnations = 3 Number of threads to allocate to the execution of Actions on the TSI ( >= # TSI )
16 UUNICORNICOREE clcliientent
Easy to get performance information – [high level] ● Resource management
17 UUNICORNICOREE clcliientent
Interface to launch compilation jobs...
Aware of data type size
18 IInnstastalllalattiionon
UNICORE Demo Server for testing ● Easy to Install Client Graphical Interface ● Easy to create and deploy jobs ● Graphical dependencies – Demo Certificates easy to use and install – Needs root privileges (to run TSI)
19 IInnstastalllalattiionon
GT4 Huge download, need to compile No separate downloads for client/server Needs root privileges (sudo) Certificates hard to use/install Needs a database installed.
20 PePerfrformormaancncee ToolsTools forfor GGridrid
● Crossgrid – verification and performance prediction tools, detection of performance bottlenecks in applications in Grid environments. ● Global Grid Forum – standardization in the field of monitoring and performance analysis Key Features – Low overhead, Low latency, Transparent – Scalable, Secure
21 CrCroosssGsGrridid AArcrchitechitectuturere BBiioomemeddiiccaall FlFloooodd HHEEPP IInntteerraaccttiivvee HHEEPP DDaattaa HHEEPP HHiigghh WWeeaatthheerr AApppplliiccaattiioonn AApppplliiccaattiioonn DDiissttrriibbuutteedd DDaattaa MMiinniningg oonn GGrriidd LLeevveelTlTrrigigggeerr FoForreeccaasstt AAcccceessss AApppplliiccaattiioonn AApppplliiccaattiioonn aapppplliiccaattiioonn Applications CrossGrid MPI Performance CrossGrid And Supporting MPI Performance MMeettrricicss aanndd PPoorrttaall VVeerriiffiiccaattiioonn AAnnaallyyssiiss Benchmarks Tools Benchmarks
DDaattaaGriGridd Applications Development MMPPIICCHH--GG Support GLGLOBOBUUSS
GGlloobbuuss IInntteerraaccttiivvee GGrriidd DDaattaa DDiissttrriibbuutteedd RRooaamiminngg RReepplliiccaa DDiissttrriibbuutteedd VViissuuaalliissaattiioonn MMiinniinngg oonn DDaattaa AAcccceessss EEXXTTEERRNNALAL MMaannaaggeerr DDaattaa AAcccceessss KKeerrnneell GGrridid CCoollleeccttiioonn Grid DDaattaaggrriidd DDaattaaGGrriidd GGrriidd UUsseerr IInntteerraaccttiioonn GGrriidd Common JJoobb RReepplliiccaa RReessoouurrccee SSeerrvviiccee MMoonniittoorriinngg Services MMaannaaggeerr MMaannaaggeerr MMaannaaggeememenntt
RReepplliiccaa GGRRAAMM CCaattaalologg GGSSII GGlloobbuuss--IIOO MMDDSS GGrriiddFTPFTP GGAASSSS
Local RReessoouurrccee RReessoouurrccee RReessoouurrccee RReessoouurrccee RReessoouurrccee RReessoouurrccee RReessoouurrccee Manager Manager Manager MMaannaaggeerr Manager Manager Manager Resources Manager Manager Manager Manager Manager Manager Secondary Optimization Scientific Visualization Secondary CCPPUU Optimization Scientific DDeetteeccttoorr VVRR Visualization Storage of Data Instruments tools Storage of Data Instruments LLooccaall HHiigghh ssyysstteemsms tools Access (Medical Access (Medical LLeevveell ((CCaavveess,, Scaners, TTeerrttiiaarryy Scaners, TTrriiggggeerr iimmemmerrssee Satelites, SSttoorraaggee Satelites, ddeesskkss)) RRaaddaarrss)) 22 CrCroosssGsGrridid ZoomZoom
C rr o s s G rr ii d B e n c h m a r k s T e c h n ii c a ll P e r f o r m a n c e a n a l y s i s ( 2 . 4 ) A n n e x ( 2 . 3 ) F ii g .. W P 2 -- 1 A u t o m a t i c a n a l y s i s
A p p l i c a t i o n s ( W P 1 ) P e r f o r m a n c e e x e c u t i n g o n G r i d V i s u a l i z a t i o n G r i d t e s t b e d ( W P 4 ) M o n i t o r i n g m e a s u r e m e n t ( 3 . 3 ) A n a l y t i c a l m o d e l
N o t n o w n e e d e d
M P I A p p l i c a t i o n v e r i f i c a t i o n S o u r c e ( 2 . 2 ) C o d e
23 GGridrid MoMonnitoriitoringng
● Two levels ( network and/or server )
● OCM-G (Application information) – Grid-Enabled OMIS-Compliant Monitoring system – Needs recompilation of applications ● SANTA-G (external instrumentation (TCPDUMP)) – Grid-Enabled System Area Networks Trace Analysis ● JIRO (Information from Grid Infrastructure)
24 PePerfrformormaancncee ToolsTools
● OCM-G (Grid-Enabled OMIS-Compliant Monitor) ● Needs App Recompilation Target : Applications Works/gets information on all nodes ● JIS Java Instrumentation Suite Target ● System-wide monitoring (OS, JVM, Application) – Need to be deployed on every node. – No Recompilation 25 PePerfrformormaancncee enenhahanncemcementent prproopposaosallss
www.gridcore.org – GFD-5 Advance Reservation API Be able to reserve CPU or Network Bandwidth in advance. – Also disk and graphic-pipeline ● Can enable more intelligent scheduling schemes
26 TTestinestingg PerPerfformormaancncee
25 sequential jobs of [/bin/sleep 10] Hardware – Pentium M 1.8 Ghz – 512 Mb UNICORE – Time : 396 seconds – Overhead : 37 % ( 6 seconds per job ) GLOBUS – Time : 301 seconds – Overhead : 17 % ( 2 seconds per job )
27 TTestinestingg PerPerfformormaancncee
But UNICORE : 1 % CPU GLOBUS : 5 % CPU
Hard to measure (for this kind of test) – GLOBUS use more than 60 % CPU on some peaks (top) – Sleep doesn't use CPU time...
28 PePerfrformormaancncee IIsssuessues
SOAP processing Numerical to ASCII encoding ● sprintf("
Not only a Grid problem. More general issue (WS) ● Layers Increase overhead, hardware is far, far away... Security… SSL Overhead. 29 4-4-JJobob PerformPerformaancence StudStudyy
Typical 4-job Simultaneous Trace with 2-CPU
30 4-4-JJobob DistriDistribbututiionon
Needs more overlapping
31 4 - J o b D i s t r i b u t i o n
D i f f e r e n t d i s t r i b u t i o n
32 4 - J o b D i s t r i b u t i o n
J D K v e r s i o n i n t r o d u c e s c h a n g e s o n t h r e a d s a n d p r i o r i t y .
C h a n g i n g j o b t y p e ( C P U - I n t e n s i v e v s I / O )
1 . 5 b e t t e r t h a n 1 . 4 ? N o t a l w a y s , s c r i p t c a l l s u s e s t w i c e t i m e t h a n 1 . 4
33 UUNICORNICORE/GE/GLOBLOBUUSS DDEMOEMO
34 Bibliography
THE UNICORE GRID AND ITS OPTIONS FOR PERFORMANCE ANALYSIS http://www.fz-juelich.de/zam/vsgc/pub/romberg-2003-UGO.pdf ● www.unicore.org ● www.globus.org ● www.crossgrid.org ● www.gridforum.org
● http://www.extreme.indiana.edu:1947/xgws/papers/soap-hpdc2002/soap-hpdc2002.pdf – SOAP for high performance computing
35