Explorations in the Grid/WS Computing Jungle
A Narrative of the Reverse-Engineering Attempts of a “New” Distributed Paradigm
François Taïani, Matti Hiltunen & Rick Schlichting
Opera Talk, 6 May 2008, Comp. Dept., Cambridge University The best and safest method of philosophizing seems to be, first to inquire diligently into the properties of things, and to establish those properties by experiences and then to proceed more slowly to hypotheses for the explanation of them. For hypotheses should be employed only in explaining the properties of things, but not assumed in determining them; unless so far as they may furnish experiments. Isaac Newton
2 Context: Middleware Complexity
ORBacus Request Processing
one request, 2065 individual invocations, over 50 C-functions and 140 C++ classes.
3 Web Services’ Bright New World
Grid Computing: federating resources over the Web Services: integrating services Internet
Globus (Argonne Lab.): reference implementation
Large, complex, collaborative middleware (IBM, Apache,...)
Very poor performances: Up to 30s to create a simple distributed object (counter) Up to 2s for a roundtrip remote add operation on this counter
Where does it come from? Does it tell us something about modern mw development?
4 Globus
Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around Argonne National Laboratory.
Globus is a high level “middleware” (software glue) It offers services to share/ use remote distributed “resources” (CPU, memory, DB, bandwidth)
Since version 3.9.x use Web Service “connectivity” Web Services: “connectivity” middleware across the Internet Integration of services across organizations Related alphabet soup: SOAP, XML, WSDL, HTTP, .NET, etc.
5 Exploration Goals
We wanted to understand Globus (at least its connectivity)
Huge piece of software (3.9.x): 123,839 lines in Java (without reused libraries) 1,908,810 lines in C/C++ (including reused libraries)
Many libraries / technologies involved: XML, WSDL (Descr. Lang), WSRF (Resource Framework) Axis (Java to SOAP), Xerces (XML Parsing), com.ibm.wsdl
How to understand that?
A typical reverse engineering problem
6 Methodology I + First Results
create resource
subscribe to changes client add 3 container notify ×4 tracing Java VM destroy resource
execution traces
First attempt: tracing everything (outside the JVM libs) client : 1,544,734 local method call (sic) server : 6,466,652 local method calls (sic) [+time out]
How to visualize such results?
7 Program Visualization: a few Notions
Problem studied for quite a long time now.
Different aspect : collection, manipulation, visualization.
Visualization some form of projection (many proposed).
Our goal: understand software structure:
lib 1 lib1.Wale .breath Growing lib1.Mammal.inhale Call lib2.Lung .inhale lib 2 lib2.Muscle.contract Stack lib2.Nerve .transmit lib 3 lib3.Signal.travel
Tracing calls reveals the software structure.
8 Methodology I lib1.Wale .breath lib1.Mammal.inhale lib2.Lung .inhale lib2.Muscle.contract lib2.Muscle.stop lib2.Nerve .transmit lib3.Blood .flow lib2.Nerve .transmit lib3.Signal.travel lib3.Pressure.foo lib3.Signal.travel
A call graph obtained by tracing
Aggregates invocations of the same library. Chart w.r.t. position in call stack.
3 100% 80% 2 lib3 60% lib3 lib2 lib2 Calls Calls 40% 1 lib1 lib1 20% 0 0% 1 2 3 4 5 6 1 2 3 4 5 6 Stack Depth Stack Depth 9 Methodology I
lib 1
lib 2
lib 3
Software Structure
Package Activity vs. Stack Depth
10 Package Activity vs. Stack Depth
180,000 (client, 1 creation, 4 requests, 1 destruction)
160,000 org.apache.xerces org.apache.xml 140,000 org.apache.axis org.apache.log4j 120,000 org.apache.xpath org.apache.commons 100,000 com.ibm.wsdl others Calls 80,000
60,000
40,000
20,000
0 1 4 7 19 22 28 34 37 43 46 49 55 10 13 16 25 31 40 52 Stack Depth 11 Package Activity vs. Stack Depth
180,000 (client, 1 creation, 4 requests, 1 destruction)
160,000 org.apache.xerces 3 org.apache.xml 140,000 org.apache.axis 2 lib3 lib2 org.apache.log4j 120,000 Calls 1 lib1 org.apache.xpath 0 org.apache.commons 100,000 1 2 3 4 5 6 Stack Depth com.ibm.wsdl others Calls 80,000
60,000
40,000
20,000
0 1 4 7 19 22 28 34 37 43 46 49 55 10 13 16 25 31 40 52 Stack Depth 12 Package Activity vs. Stack Depth
180,000 (client, 1 creation, 4 requests, 1 destruction)
160,000 org.apache.xerces org.apache.xml L! 140,000 Looks better, XM or gto.apa che.axis but is the same! ue org.apache.log4j 120,000 d 34) org.apache.xpath 2,5 org.apache.commons 100,000 37 (1, com.ibm.wsdl s others Calls n 80,000 atio voc 60,000 f in % o 40,08009
20,000
0 4 1 7 10 13 19 25 28 34 37 43 46 49 52 55 16 22 31 40 Stack Depth 13 Package Activity vs. Stack Depth
100% (client, percentage view)
80% 100% 80% org.apache.xerces 60% lib3 lib2 org.apache.xml Calls 40% lib1 org.apache.axis 20% 60% 0% org.apache.log4j 1 2 3 4 5 6 org.apache.xpath Stack Depth
Calls org.apache.commons 40% com.ibm.wsdl others
20%
0% 7 1 4 10 13 22 25 28 34 37 43 46 49 55 16 19 31 40 52 Stack Depth 14 Package Activity vs. Stack Depth
100% (client, percentage view) us! lob y G 80% t b or,g. anpacohe.xerces g! is org.apache.xml in .ax ars he org.apache.axis e p 60% ac org.apache.loigv4j .ap urs org re orgc.apache.xpath Calls y ly org.apache.commons d b ab 40% se rob com.ibm.wsdl L u p others M k X tac g s 20% lon Very
0% 7 1 4 10 13 22 25 28 34 37 43 46 49 55 16 19 31 40 52 Stack Depth 15 What does it tell us?
Most of the local invocations (89%) are related to XML parsing (org.apache.xerces, org.apache.xml).
The parsing is used by Axis (the SOAP/Java bridge from the apache foundation), not directly by Globus.
The very long stacks we observe (up to 57 frames!) most probably reflect some recursive parsing loop, rather than the program structure.
Similar findings on the server side (only more dramatic, stack depth up to 108 (sic), 4 times more invocations).
16 New Questions
More insight needed:
Does invocation count reflect real performance?
How “bad” is really the platform?
Can we do the same kind of “structural” projection of profiling data?
If yes, is it useful?
Our choice: 2 step approach (1) Black box profiling (2) “Internal” profiling using sampling
17 Chosen Approach
2 steps: 1. Black box profiling: minimal interferences. Coarse results. 2. Sample based profiling: less accurate but more detailed.
We focused on the connectivity of the WSRF implementation of GT4-Java: Low level “plumbing”. No high level service involved Motivation: profile the founding bricks of the Globus platform
Experimental set-up: Standalone SMP server running 4 Intel Xeon @ 1.6GHz No network cost involved! Avoids context switching overhead! Globus 3.9.4 used (last GT4 alpha release, released Dec.04)
18 Black-Box Profiling: Approach
Black Box Approach: Measure externally visible latencies Many different situations to be considered!
create
subscribe client add 3 container notify 3
destroy ×5 clock instrumentation ×5 influence of resource init ×5 averaging influence of client init ×50 influence of container init (10 000 invocations) 19 Resource Set-Up
create subscribe client cont. ×5 ×5
20 Resource Set-Up
Container init Client init overhead overhead (~8.2s!) (~24.8s!)
High lazy initialization costs! (> 30s!)
Stabilized latency remains high (380ms)
21 First Notification
1st notify client cont. ×5 ×5
22 First Notification Client init Container init overhead overhead (~1.4s!) (~430ms)
Stabilized latency (~1.1s!)
23 Second Notification
2nd notify here client cont. w ×5 very ) ×5 e s on 0m ati (17 aliz gh iti hi Resource init in till azy y s overhead L nc (~930ms!) late est qu d re ize Stabilizedbil latencySta 1st notification (~1.1s)
24 Internal Profiling: Basics
Profiling data obtained through sampling (SUN hprof basic profiler on Java 1.4) JVM periodically stopped. Stack of active thread is captured. Result : A set of weighted stack traces. Weight = measures how often the stack was observed.
Visualization: Set of weight stacks = multi-dimensional object Time (represented by weights) Threads: each trace belongs to a thread Control flow (represented by stacks, reflects use relationships) Code Structure (package organization, class hierarchy, etc.)
Projection (aggregation / collapsing) required
Many possibility. Our choice: code structure + stack depth 25 Methodology III lib1.Wale .breath lib1.Mammal.inhale lib2.Lung .inhale lib2.Muscle.contract lib2.Muscle.stop lib2.Nerve .transmit lib3.Blood .flow lib2.Nerve .transmit lib3.Signal.travel ×3 lib3.Pressure.foo ×1 lib3.Signal.travel ×2
Sampling yields a set of weighted stack traces (weight reflects time spent)
Aggregates invocations of the same library. Chart w.r.t. position in call stack.
6 6 IVE 5 VE 5 US SI CL lib3 LU lib3 4 EX 4 INC lib2 3 lib2 3 2 lib1 2 lib1 Time Units Time Units 1 1 0 0 1 2 3 4 5 6 1 2 3 4 5 6 Stack Depth Stack Depth 26 Experimental Set-Up
create subscribe client add 3 container notify 3 hprof ×5 Java VM destroy ×5
profiling data
27 Container Profiling: Results
Sharp drop at length 13 Busy waiting related to notification management. Outside request critical path. Layered structured Some very deep traces. Look for upper stack quite regular beyond depth depths 28 (recursion?) org.apache.axis predominant
28 New Experimental Set-Up
create subscribe
client add 3 container notify 3 hprof ×5 Java VM destroy ×5
+ extra granularity to observe package org.apache.axis profiling data
29 New Results Traces of length 13 have disappeared. They were caused by the notification management.
sun.reflect (reflection)
org.globus.gsi (security)
org.globus.wsrf This is a recursion in org.apache.wsdl.symbolTable (web services). Symbol management issue?
30 Profiling Breakdown
Abstracts away low level packages (java.*, etc.)
Sample breakdown among “higher level” packages: Package Name Samples % org.apache.axis.wsdl 231 21% org.apache.axis.encoding 66 6% org.apache.axis (others) 113 10% org.globus.gsi 249 23% org.globus.wsrf 49 4% cryptix.provider.rsa 82 7% org.apache.xerces 78 7% others 237 21%
31 Profiling Breakdown
Abstracts away low level packages (java.*, etc.)
Sample breakdown among “higher level” packages: Package Name Samples % org.apache.axis.wsdl 231 21% org.apache.axis.encoding 66 6% org.apache.axis (others) 113 10% org.globus.gsi 249 23% org.globus.wsrf 49 4% cryptix.provider.rsa 82 7% org.apache.xerces 78 7% others 237 21% Symbol management issue?
32 Profiling Breakdown
Abstracts away low level packages (java.*, etc.)
Sample breakdown among “higher level” packages: Package Name Samples % org.apache.axis.wsdl 231 21% org.apache.axis.encoding 66 6% org.apache.axis (others) 113 10% org.globus.gsi 249 23% org.globus.wsrf 49 4% cryptix.provider.rsa 82 7% org.apache.xerces 78 7% others 237 21% SOAP + XML: 44%
33 Profiling Breakdown
Abstracts away low level packages (java.*, etc.)
Sample breakdown among “higher level” packages: Package Name Samples % org.apache.axis.wsdl 231 21% org.apache.axis.encoding 66 6% org.apache.axis (others) 113 10% org.globus.gsi 249 23% org.globus.wsrf 49 4% cryptix.provider.rsa 82 7% org.apache.xerces 78 7% others 237 21% Security / Cryptography: 30%
34 Many Other Visualisation Ways
800 30
Time Units 700 Average Stack Depth 25 600 20 500
400 15
Time Units 300 10 Average Stack Depth 200
5 100
0 0
main (v3.9.2)
Thread-3 Thread-2 Thread-5 Thread-9 Thread-0 Thread-7 Thread-6 Thread-8 Thread-1 Thread-4
Thread-20 Thread-10 Thread-22 Thread-11 Thread-17 Thread-14 Thread-15 Thread-25 Thread-21 Thread-19 Thread-28 Thread-24 Thread-29 Thread-16 Thread-18 Thread-26 Thread-13 Thread-27 Thread-12 Thread-23
process reaper
Reference Handler 35 Summing Up
Globus Lazy optimisation: very high latency on first invocation of operations (up to 30s to set up a resource on a new container!) Stabilized latencies still high: ~ 160ms for a round trip request (with authentication turned on)
No clear culprit. Technology overload: WSDL, SOAP, security
Is lazy optimisation a problem? Yes and No.
36 Longer Term
Platform and technology come and go Globus is a moving target
But experimental approaches stay
And so do development practices
37 Middleware Practices: Are we doomed?
Lazy optimization flexibility paradox
Poor performance abstraction leaking
Reverse engineering Frankenstein’s return?
Can Cognitive-based Middleware save us?
API are for real beings!
38 To look further
Globus profiling The Impact of Web Service Integration on Grid Performance, François Taïani, Matti Hiltunen, Rick Schlichting, HPDC-14, 2005
Large graph reverse engineering CosmOpen: Dynamic reverse-engineering on a budget, Francois Taiani, Marc-Olivier Killijian, Jean-Charles Fabre, TR COMP- 002-2008, Lancaster University, 2008 http://ftaiani.ouvaton.org/7-software/index.html#CosmOpen
Next Generation Middleware Group at Lancaster
39 The End (Thank you)
40 Package Activity vs. “Progress”
100% org.apache.xerces 90% client (1 creation, com.ibm.wsdl
. 80% 4 requests, org.apache.xpath 1 destruction) org.apache.xml
Slice 70% org.apache.axis 60% org.apache.log4j org.apache.commons
Progress 50% org.xml.sax in org.globus.wsrf 40% org.apache.naming Calls 30% others of Remote Calls % 20%
10%
0%
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 Progress (% of Local Calls) 41 Profiling Results (Exclusive, Server) 100
90 java.net java.security 80 sun.security.provider org.apache.axis 70 124 others 666. 1124 java.util.zip 60 java.lang 50 org.apache.xerces Hits
40
30
20
10
0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 Stack Depth 42