Explorations in the Grid/WS Computing Jungle

A Narrative of the Reverse-Engineering Attempts of a “New” Distributed Paradigm

François Taïani, Matti Hiltunen & Rick Schlichting

Opera Talk, 6 May 2008, Comp. Dept., Cambridge University The best and safest method of philosophizing seems to be, first to inquire diligently into the properties of things, and to establish those properties by experiences and then to proceed more slowly to hypotheses for the explanation of them. For hypotheses should be employed only in explaining the properties of things, but not assumed in determining them; unless so far as they may furnish experiments. Isaac Newton

2 Context: Middleware Complexity

ORBacus Request Processing

one request, 2065 individual invocations, over 50 C-functions and 140 C++ classes.

3 Web Services’ Bright New World

 Grid Computing: federating resources over the Web Services: integrating services Internet

 Globus (Argonne Lab.): reference implementation

 Large, complex, collaborative middleware (IBM, Apache,...)

 Very poor performances:  Up to 30s to create a simple distributed object (counter)  Up to 2s for a roundtrip remote add operation on this counter

 Where does it come from?  Does it tell us something about modern mw development?

4 Globus

 Reference Implementation of the Grid Standards.  Developed by the “Globus alliance”, a partnership around Argonne National Laboratory.

 Globus is a high level “middleware” (software glue)  It offers services to share/ use remote distributed “resources” (CPU, memory, DB, bandwidth)

 Since version 3.9.x use Web Service “connectivity”  Web Services: “connectivity” middleware across the Internet  Integration of services across organizations  Related alphabet soup: SOAP, XML, WSDL, HTTP, .NET, etc.

5 Exploration Goals

 We wanted to understand Globus (at least its connectivity)

 Huge piece of software (3.9.x):  123,839 lines in Java (without reused libraries)  1,908,810 lines in C/C++ (including reused libraries)

 Many libraries / technologies involved:  XML, WSDL (Descr. Lang), WSRF (Resource Framework)  Axis (Java to SOAP), Xerces (XML Parsing), com.ibm.wsdl

 How to understand that?

 A typical reverse engineering problem

6 Methodology I + First Results

create resource

subscribe to changes client add 3 container notify ×4 tracing Java VM destroy resource

execution traces

 First attempt: tracing everything (outside the JVM libs)  client : 1,544,734 local method call (sic)  server : 6,466,652 local method calls (sic) [+time out]

 How to visualize such results?

7 Program Visualization: a few Notions

 Problem studied for quite a long time now.

 Different aspect : collection, manipulation, visualization.

 Visualization some form of projection (many proposed).

 Our goal: understand software structure:

lib 1 lib1.Wale .breath Growing lib1.Mammal.inhale Call lib2.Lung .inhale lib 2 lib2.Muscle.contract Stack lib2.Nerve .transmit lib 3 lib3.Signal.travel

 Tracing calls reveals the software structure.

8 Methodology I lib1.Wale .breath lib1.Mammal.inhale lib2.Lung .inhale lib2.Muscle.contract lib2.Muscle.stop lib2.Nerve .transmit lib3.Blood .flow lib2.Nerve .transmit lib3.Signal.travel lib3.Pressure.foo lib3.Signal.travel

A call graph obtained by tracing

 Aggregates invocations of the same library.  Chart w.r.t. position in call stack.

3 100% 80% 2 lib3 60% lib3 lib2 lib2 Calls Calls 40% 1 lib1 lib1 20% 0 0% 1 2 3 4 5 6 1 2 3 4 5 6 Stack Depth Stack Depth 9 Methodology I

lib 1

lib 2

lib 3

Software Structure

Package Activity vs. Stack Depth

10 Package Activity vs. Stack Depth

180,000 (client, 1 creation, 4 requests, 1 destruction)

160,000 org.apache.xerces org.apache. 140,000 org.apache.axis org.apache. 120,000 org.apache. org.apache.commons 100,000 com.ibm.wsdl others Calls 80,000

60,000

40,000

20,000

0 1 4 7 19 22 28 34 37 43 46 49 55 10 13 16 25 31 40 52 Stack Depth 11 Package Activity vs. Stack Depth

180,000 (client, 1 creation, 4 requests, 1 destruction)

160,000 org.apache.xerces 3 org.apache.xml 140,000 org.apache.axis 2 lib3 lib2 org.apache.log4j 120,000 Calls 1 lib1 org.apache.xpath 0 org.apache.commons 100,000 1 2 3 4 5 6 Stack Depth com.ibm.wsdl others Calls 80,000

60,000

40,000

20,000

0 1 4 7 19 22 28 34 37 43 46 49 55 10 13 16 25 31 40 52 Stack Depth 12 Package Activity vs. Stack Depth

180,000 (client, 1 creation, 4 requests, 1 destruction)

160,000 org.apache.xerces org.apache.xml L! 140,000 Looks better, XM or gto.apa che.axis but is the same! ue org.apache.log4j 120,000 d 34) org.apache.xpath 2,5 org.apache.commons 100,000 37 (1, com.ibm.wsdl s others Calls n 80,000 atio voc 60,000 f in % o 40,08009

20,000

0 4 1 7 10 13 19 25 28 34 37 43 46 49 52 55 16 22 31 40 Stack Depth 13 Package Activity vs. Stack Depth

100% (client, percentage view)

80% 100% 80% org.apache.xerces 60% lib3 lib2 org.apache.xml Calls 40% lib1 org.apache.axis 20% 60% 0% org.apache.log4j 1 2 3 4 5 6 org.apache.xpath Stack Depth

Calls org.apache.commons 40% com.ibm.wsdl others

20%

0% 7 1 4 10 13 22 25 28 34 37 43 46 49 55 16 19 31 40 52 Stack Depth 14 Package Activity vs. Stack Depth

100% (client, percentage view) us! lob y G 80% t b or,g. anpacohe.xerces g! is org.apache.xml in .ax ars he org.apache.axis e p 60% ac org.apache.loigv4j .ap urs org re orgc.apache.xpath Calls y ly org.apache.commons d b ab 40% se rob com.ibm.wsdl L u p others M k  X tac g s 20% lon Very

0% 7 1 4 10 13 22 25 28 34 37 43 46 49 55 16 19 31 40 52 Stack Depth 15 What does it tell us?

 Most of the local invocations (89%) are related to XML parsing (org.apache.xerces, org.apache.xml).

 The parsing is used by Axis (the SOAP/Java bridge from the apache foundation), not directly by Globus.

 The very long stacks we observe (up to 57 frames!) most probably reflect some recursive parsing loop, rather than the program structure.

 Similar findings on the server side (only more dramatic, stack depth up to 108 (sic), 4 times more invocations).

16 New Questions

More insight needed:

 Does invocation count reflect real performance?

 How “bad” is really the platform?

 Can we do the same kind of “structural” projection of profiling data?

 If yes, is it useful?

 Our choice: 2 step approach  (1) Black box profiling  (2) “Internal” profiling using sampling

17 Chosen Approach

 2 steps: 1. Black box profiling: minimal interferences. Coarse results. 2. Sample based profiling: less accurate but more detailed.

 We focused on the connectivity of the WSRF implementation of GT4-Java:  Low level “plumbing”. No high level service involved  Motivation: profile the founding bricks of the Globus platform

 Experimental set-up:  Standalone SMP server running 4 Intel Xeon @ 1.6GHz  No network cost involved!  Avoids context switching overhead!  Globus 3.9.4 used (last GT4 alpha release, released Dec.04)

18 Black-Box Profiling: Approach

 Black Box Approach: Measure externally visible latencies  Many different situations to be considered!

create

subscribe client add 3 container notify 3

destroy ×5 clock instrumentation ×5 influence of resource init ×5 averaging influence of client init ×50 influence of container init (10 000 invocations) 19 Resource Set-Up

create subscribe client cont. ×5 ×5

20 Resource Set-Up

Container init Client init overhead overhead (~8.2s!) (~24.8s!)

High lazy initialization costs! (> 30s!)

Stabilized latency remains high (380ms)

21 First Notification

1st notify client cont. ×5 ×5

22 First Notification Client init Container init overhead overhead (~1.4s!) (~430ms)

Stabilized latency (~1.1s!)

23 Second Notification

2nd notify here client cont. w ×5 very ) ×5 e s on 0m ati (17 aliz gh iti hi Resource init in till azy y s overhead L nc (~930ms!) late est qu d re ize Stabilizedbil latencySta 1st notification (~1.1s)

24 Internal Profiling: Basics

 Profiling data obtained through sampling (SUN hprof basic profiler on Java 1.4)  JVM periodically stopped. Stack of active thread is captured.  Result : A set of weighted stack traces. Weight = measures how often the stack was observed.

 Visualization: Set of weight stacks = multi-dimensional object  Time (represented by weights)  Threads: each trace belongs to a thread  Control flow (represented by stacks, reflects use relationships)  Code Structure (package organization, class hierarchy, etc.)

 Projection (aggregation / collapsing) required

 Many possibility. Our choice: code structure + stack depth 25 Methodology III lib1.Wale .breath lib1.Mammal.inhale lib2.Lung .inhale lib2.Muscle.contract lib2.Muscle.stop lib2.Nerve .transmit lib3.Blood .flow lib2.Nerve .transmit lib3.Signal.travel ×3 lib3.Pressure.foo ×1 lib3.Signal.travel ×2

Sampling yields a set of weighted stack traces (weight reflects time spent)

 Aggregates invocations of the same library.  Chart w.r.t. position in call stack.

6 6 IVE 5 VE 5 US SI CL lib3 LU lib3 4 EX 4 INC lib2 3 lib2 3 2 lib1 2 lib1 Time Units Time Units 1 1 0 0 1 2 3 4 5 6 1 2 3 4 5 6 Stack Depth Stack Depth 26 Experimental Set-Up

create subscribe client add 3 container notify 3 hprof ×5 Java VM destroy ×5

profiling data

27 Container Profiling: Results

Sharp drop at length 13 Busy waiting related to notification management. Outside request critical path. Layered structured Some very deep traces. Look for upper stack quite regular beyond depth depths 28 (recursion?) org.apache.axis predominant

28 New Experimental Set-Up

create subscribe

client add 3 container notify 3 hprof ×5 Java VM destroy ×5

+ extra granularity to observe package org.apache.axis profiling data

29 New Results Traces of length 13 have disappeared. They were caused by the notification management.

sun.reflect (reflection)

org.globus.gsi (security)

org.globus.wsrf This is a recursion in org.apache.wsdl.symbolTable (web services). Symbol management issue?

30 Profiling Breakdown

 Abstracts away low level packages (java.*, etc.)

 Sample breakdown among “higher level” packages: Package Name Samples %  org.apache.axis.wsdl 231 21%  org.apache.axis.encoding 66 6%  org.apache.axis (others) 113 10%  org.globus.gsi 249 23%  org.globus.wsrf 49 4%  cryptix.provider.rsa 82 7%  org.apache.xerces 78 7%  others 237 21%

31 Profiling Breakdown

 Abstracts away low level packages (java.*, etc.)

 Sample breakdown among “higher level” packages: Package Name Samples %  org.apache.axis.wsdl 231 21%  org.apache.axis.encoding 66 6%  org.apache.axis (others) 113 10%  org.globus.gsi 249 23%  org.globus.wsrf 49 4%  cryptix.provider.rsa 82 7%  org.apache.xerces 78 7%  others 237 21% Symbol management issue?

32 Profiling Breakdown

 Abstracts away low level packages (java.*, etc.)

 Sample breakdown among “higher level” packages: Package Name Samples %  org.apache.axis.wsdl 231 21%  org.apache.axis.encoding 66 6%  org.apache.axis (others) 113 10%  org.globus.gsi 249 23%  org.globus.wsrf 49 4%  cryptix.provider.rsa 82 7%  org.apache.xerces 78 7%  others 237 21% SOAP + XML: 44%

33 Profiling Breakdown

 Abstracts away low level packages (java.*, etc.)

 Sample breakdown among “higher level” packages: Package Name Samples %  org.apache.axis.wsdl 231 21%  org.apache.axis.encoding 66 6%  org.apache.axis (others) 113 10%  org.globus.gsi 249 23%  org.globus.wsrf 49 4%  cryptix.provider.rsa 82 7%  org.apache.xerces 78 7%  others 237 21% Security / Cryptography: 30%

34 Many Other Visualisation Ways

800 30

Time Units 700 Average Stack Depth 25 600 20 500

400 15

Time Units 300 10 Average Stack Depth 200

5 100

0 0

main (v3.9.2)

Thread-3 Thread-2 Thread-5 Thread-9 Thread-0 Thread-7 Thread-6 Thread-8 Thread-1 Thread-4

Thread-20 Thread-10 Thread-22 Thread-11 Thread-17 Thread-14 Thread-15 Thread-25 Thread-21 Thread-19 Thread-28 Thread-24 Thread-29 Thread-16 Thread-18 Thread-26 Thread-13 Thread-27 Thread-12 Thread-23

process reaper

Reference Handler 35 Summing Up

 Globus  Lazy optimisation: very high latency on first invocation of operations (up to 30s to set up a resource on a new container!)  Stabilized latencies still high: ~ 160ms for a round trip request (with authentication turned on)

 No clear culprit. Technology overload: WSDL, SOAP, security

 Is lazy optimisation a problem? Yes and No.

36 Longer Term

 Platform and technology come and go  Globus is a moving target

 But experimental approaches stay

 And so do development practices

37 Middleware Practices: Are we doomed?

 Lazy optimization  flexibility paradox

 Poor performance  abstraction leaking

 Reverse engineering  Frankenstein’s return?

 Can Cognitive-based Middleware save us?

 API are for real beings!

38 To look further

 Globus profiling  The Impact of Web Service Integration on Grid Performance, François Taïani, Matti Hiltunen, Rick Schlichting, HPDC-14, 2005

 Large graph reverse engineering  CosmOpen: Dynamic reverse-engineering on a budget, Francois Taiani, Marc-Olivier Killijian, Jean-Charles Fabre, TR COMP- 002-2008, Lancaster University, 2008  http://ftaiani.ouvaton.org/7-software/index.html#CosmOpen

 Next Generation Middleware Group at Lancaster

39 The End (Thank you)

40 Package Activity vs. “Progress”

100% org.apache.xerces 90% client (1 creation, com.ibm.wsdl

. 80% 4 requests, org.apache.xpath 1 destruction) org.apache.xml

Slice 70% org.apache.axis 60% org.apache.log4j org.apache.commons

Progress 50% org.xml.sax in org.globus.wsrf 40% org.apache.naming Calls 30% others of Remote Calls % 20%

10%

0%

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 Progress (% of Local Calls) 41 Profiling Results (Exclusive, Server) 100

90 java.net java.security 80 sun.security.provider org.apache.axis 70 124 others 666. 1124 java.util.zip 60 java.lang 50 org.apache.xerces Hits

40

30

20

10

0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 Stack Depth 42