Java Do es not Distribute

Gerald Brose KlausPeter Lohr Andre Spiegel

fbroselohrspiegelginffub erlinde

Technical Report B

Octob er

Abstract

Java is commonly considered the ideal language for implementing software for the In

ternet A closer lo ok however reveals that distributed programming is p o orly supp orted

in Java This is b ecause the very design of the language rules out distributiontransparent

remote invo cation It is shown that Suns technology for distributed Java programming

RMI makes things worse by allowing two dierent invo cation semantics to hide b ehind

an ob ject variable The consequences of using CORBA instead of RMI are inv estigated

Various options for changing either RMI or Java itself are considered so that language

platforms supp orting a high degree of distribution transparency could be built

Freie Universitat Berlin

Institut fur Informatik

Takustrae

D Berlin Germany

This rep ort also app ears in Pro c TOOLS Pacic Melb ourne Australia November

Intro duction

Java Gosling is b eing heralded as the ultimate language for internetbased distributed

programming But when it comes to actually writing distributed applications a serious

aw in the basic design of the language b ecomes apparent Javas typ e system together

with its mechanism of parameter passing leads to problems with remote metho d invo ca

tion

The current version of Java do es not allow for a distributed implementation where

remote metho d invo cation would have the same semantics as its lo cal counterpart The

problem is also not adequately addressed in RMI Sun Suns proprietary technology

a programming As a result semantic deviations from the intended for distributed Jav

program logic can o ccur in distributed programs written in JavaRMI It is therefore

questionable whether Java in its current form is a suitable platform for distributed pro

gramming

The problem is explained in section which is followed by a discussion of RMI in

section Section explores the combination of Java and CORBA OMG Our

observations suggest that revising the language itself may b e the only satisfactory solution

Several options for this are discussed in section and our preferred option is presented

Remote parameters in Jav a

Javas metho d invo cation mechanism is not suitable for distributed programming based

on remote invo cation To show this we will rst sketch the syntax and semantics of purely

lo cal invo cations fo cussing on parameter passing and then lo ok at the consequences of

extending this invo cation semantics to the distributed case

Java parameter passing

An ordinary Java metho d is invoked using the familiar notation

objectRefmethodNamearg arg n

Java like C has only one parameter passing mo de passbyvalue For primitive types

such as integer character etc this means that the metho d receives a copy of the actual

value If an ychanges aremadetosuch a copy during the execution of the metho d these

changes are not visible in the calling context

If an actual parameter has a reference type a reference to an ob ject is copied causing

the ob ject to be shared among caller and callee Any changes the metho d makes to the

ob ject are visible in the calling context In Java all instances of classes have reference

typ es including strings which are instances of the library class String and arrays for

which no library class exists Java provides no parameter passing mo de that would

er a copy of the ob ject as eg Eiel do es using dereference a parameter and deliv

expanded typ es Meyer Wesay that Java passes ob jects by reference not byvalue

It is imp ortant to note that since the only way of constructing userdened typ es is

by writing new classes al l userdened typ es are reference typ es Thus there is no way

of passing an ob ject of a userdened class by value

Parameters in distributed programs

In a distributed program ob ject invo cation may cross machine b oundaries If the caller

and the callee reside on dierent machines invo cation must be implemented as remote

method invocation which is the ob jectoriented analog to remote pro cedure call RPC

Instead of passing the arguments in pro cessor registers or on the stack the caller must

pack them into a network message and the callee must unpack them This is called

marshalling or and unmarshalling of the parameters

A problem arises when references are involved in passing parameters remotely a

reference is usually represented as a virtual address in the callers address space it makes

no sense to dereference it in a dierent address space So the reference has to be passed

in a networkwide representation which can in turn be used for remotely invoking the

ob ject it refers to

How do es this apply to Java The rst observation is that there is no obstacle to

implementing remote invo cation for programmerdened classes Referencetyp e param

eters can b e passed in a networkwide representation embedded references can b e taken

care of in the same manner

A closer view however reveals that the resulting programming platform is of little use

Generally sp eaking it is alright to distribute services in this way ie passing references to

ob jects that are invoked infrequently so that the increased latency of remote invo cations

is negligible But what ab out distributing data ie passing arrays and records ob jects

without metho ds If an array of say numb ers is passed to a remote metho d

for pro cessing the receiver just gets the reference Pro cessing the arraymay then require

remote accesses given the latency of each individual access this approach is

plainly prohibitivenot to mention the problem of compiling array indexing into remote

access The same problem is encountered with records where while not as drastic

it is still annoying With strings fortunately the problem disapp ears Strings are

immutable ob jects in Java so passing a string by reference can be implemented as pass

byvalue b ecause the semantics is the same

RMI

Suns Remote Metho d Invo cation RMI system aims at providing mechanisms for seam

less remote invo cation on ob jects in dierent virtual machines Sun and tries to inte

grate the distributed ob ject mo del into the Java language in a natural way while retaining

most of the Java languages ob ject semantics It turns out that the keyword is most

tries RMI

RMI do es preserve the regular Java invo cation syntax but parameters of remote op er

ations are treated dierent from those of lo cal op erations Remote passing is done as

follows If an actual parameter has a primitive typ e it is passed by value If an actual

parameter has a reference typ e and its class implements the Remote interface it is passed

by reference the lo cal reference b eing replaced with a network reference If the class

do es not implement Remote but rather implements the Serializable interface the ob

ject is passed by v alue using serialization If the class implements neither Remote nor

Serializable an exception is raised Arrays are serializable by default

While this seems to solve the main problem by not promoting arrays to full remote

ob jects and allowing for records to be passed by value rather than by reference this

approach has its own pitfalls Consider the following example Let RemoteStack denote a

class for remotely invokable stacks of integers implementing Remote A stack stores its

values in an array int cells In addition to the usual stack op erations such as push

pop top it exp orts an op eration

public int dump

return cells

delivering the values currently stored in the stack There are several stacks in our

distributed system registered in a StackDictionary ob ject Our demonstration program

plays with two stacks listed in the dictionary

public class RMItest

static RemoteStack s null

static RemoteStack s null

static StackDictionary stacks

new StackDictionary

public static void mainString args

stacks are created at

different sites and

registered with dictionary

s stacksgetthat

s stacksgetthis

spush

spush

int there sdump

int here sdump

there

here

Systemoutprintthere stop

Systemoutprinthere stop

but fails

As dump delivers an array and arrays are Serializable by default the result of a

remote invo cation will be a copy of the stacks array So the output of the program

should b e

there here

but be also prepared to see

there here

Why is this It is p ossible that one of the stacks s in this case happ ens to be

lo cal to the caller which causes the standard parameter passing semantics to take eect

yes So a reference to the array will b e delivered causing the assignment here

to eectively mo dify the stack

The example shows that not only can two seemingly alike ob jects have dierent be

haviour but worse there is no way to tell them apart This is b ecause it is imp ossible to

or actually lo cal nd out whether an ob ject with a Remote interface is indeed remote

to the caller

If we inadvertently and not knowing use a Remote ob ject lo cally as in the

example the op eration semantics may be dierent from what was exp ected Note that

we are not referring to a dierent exceptional semantics in the face of network latency

or failure but to the successful execution of an op eration

Also note that there is indeed no way of telling remote from lo cal ob jects even if we

knew that taking one for the other might cause problems both ob jects have the same

typ e and wehav enoway of tracking down every single ob ject reference in a system RMI

turns out to b e dangerously confusing it gives the programmer the false impression that

although Remote and Serializable parameters are treated dierently at least all ob jects

of a certain class whether implementing Remote or not b ehave alike As we have seen

this may not be the case

Fixing RMI

The problem illustrated ab ove arises b ecause with RMI Remote ob jects can in fact be

accessed lo cally as well and in this case they exhibit a subtly dierent b ehaviour

One approach to correct this is to draw the distinction between lo cal ob jects and

remote ob jects in a more consequent fashion For example we might disallow lo cal access

to Remote ob jects completely Implementing Remote would then mean that ob jects of

the class can only be called from a remote machine This would clearly eliminate the

unpleasant b ehavior of the example but seems very restrictive

As an alternative we could force every call to a Remote ob ject whether it is a lo cal call

or a true remote call to havethesame semantics that of the remote case Parameters

that would have to b e serialized for a remote call must also b e passed by value if the call

is actually lo cal Thus we should always get the same semantics when calling a particular

ob ject although just as b efore we have to deal with two sets of semantics in general a

lo cal semantics for lo cal ob jects and a remote semantics for ob jects that are even

potentially remote If we want to make an existing class remotely accessible we may

have to change b oth the metho d implementations and the calls to those metho ds from

other classes If we forget any of these changes there is no way the compiler could warn

us

Moreover the two kinds of metho d invo cations still lo ok exactly the same in the co de

So mayb e we should even change the syntax to make it apparent that a p otentially

something dierent for example remote call is

objectmethod arg

for lo cal semantics

objectmethodarg

for remote semantics

Using runtime checks the compiler would ensure that a remotely accessible ob ject is

always called using the notation and that a lo cal ob ject is always called using the

notation Thus the programmer would really have to think ab out each call esp ecially

when converting existing co de something that seems quite consistent with the overall

design of RMI

Beyond RMI

RMI emb o dies a fundamental assumption ab out how distributed computing should be

done The chief designers of RMI take the view Waldo that remote ob jects are

intrinsically dierent from lo cal ob jects and that this dierence must not be hidden by

the distribution technology in use In other words they do not subscrib e to the idea of

distribution transparency which has b een favored by other researchers and which aims

at just the opp osite to make the distinction between lo cal ob jects and remote ob jects

invisible for the programmer so that distribution issues are hidden b ehind an abstraction

b oundary

Distribution transparency has many facets the most imp ortant of which is access

transparency We sp eak of access transparency when the syntax and semantics of ob ject

invo cation are identical for b oth lo cal and remote ob jects Ideally remote metho d invo

cation should just refer to a distributed implementation of otherwise unchanged with

resp ect to syntax and semantics metho d invo cation There are numerous programming

languages whichachieve this ideal to a high degree ranging from more exp erimental ones

Bal Cardelli to mainstream languages likeAda Barnes

This is not to say that access transparency is an absolute value There are areas

where nontransparen t technologies like RMI might b e appropriate see also Kiczales

Lea RMI in its current form however is problematic for two reasons First there is

a confusion ab out the concept of remoteness RMI regards it as a property of an ob ject to

b e remote but in fact remoteness is a relation between ob jects as in A is remote from

B vs A is lo cal to B Second RMI do es not in a sense liveuptoitsown standards

it sets out to b e delib erately nontransparent but pseudotransparency lurks in there

manifest in the strange phenomenon that the syntax of lo cal and remote invo cation is

still the same but not the semantics

There have b een eorts to hide the RMI technicalities b ehind a smo other facade in

order to achieve more transparency see eg Philippsen This can of course work

only to a limited degree as the semantics will b e invariably those of RMI

CORBA and Java

If RMI is not an appropriate mo del for a distributionenabled Java what is the alter

native An obvious candidate for a distributed ob ject mo del is CORBA which we will

briey describ e in this section We will also evaluate the p ossibilityofintegrating CORBA

with Java

The CORBA ob ject mo del

The OMG denes an ob ject mo del for its Object Management Architecture OMA in

CORBA OMG through the denition of the CORBA interface denition language

IDL This ob ject mo del has b een designed in order to allow accesstransparent dis

tribution and accomo dation of such diverse programming languages as C C Cob ol

Smalltalk Java and others through language mappings

CORBA denes the following rules for passing data to and from op erations all base

are arrays and records type values integers characters are passed by value and so

structs All object type values are passed by reference CORBA ob ject typ es are

dened by sp ecifying their interface

CORBA ob ject references are opaque they do not b etray the lo cations of ob jects

The behavior of an ob ject is indep endent of whether it is invoked lo cally or remotely

CORBA denes three parameter passing mo des in out and inout in is passbyvalue

as describ ed for Javaabove out is passbyresult and inout is passbyvalueresult

Obviously the problems with the Java ob ject mo del describ ed ab ove do not arise in

CORBA as arrays and records are passed by value The CORBA ob ject mo del in itself

would allow p erfect access transparency In its present form however CORBA lacks

another imp ortant concept passing ob jects byvalue The OMG has acknowledged the

need to pass ob jects by value by issuing a corresp onding request for prop osals OMG

which recognizes that passing an ob ject by reference is inappropriate for some situations

such as when an ob ject encapsulates a data structure In these situations passing the

ob ject by value provides for greater eciency

Integrating Java with CORBA

If we integrate the CORBA ob ject mo del with a host language according to an IDL

language mapping one of its premier values access transparency is compromised if

this host language is not simply an extension of IDL Unfortunately even in the case of

Java which is conceptually very close to IDL a numb er of ob ject mo del mismatches arise

Brose

The rst of these is that Java allows the overloading of op eration names while CORBA

do es not Secondly the distinction b etween Javas reference typ e strings and arrays versus

CORBAs base typ e strings and arrays cannot b e concealed the dierence b etween a null

reference and an empty value which can be expressed in Java cannot be expressed in

CORBA

The most imp ortant p oint is again the dierence in the resp ective parameter passing

mo des Obviouslyw e cannot have b oth CORBA and Java parameter passing semantics as

the two are mutually exclusive so there remains a clear dierence b etween calling ob jects

in the CORBA and the Java ob ject mo dels Hence an accesstransparent integration of

CORBA with the Java language is not p ossible

Revising Java

Wehave seen that CORBA provides a suitable ob ject mo del for access transparency while

Java do es not We will therefore examine how Java might be mo died with resp ect to

parameter passing We will discuss a numb er of options for parameter passing techniques

and how the concept of passing ob jects by value could b e realized

Passing by reference considered harmful

The suitability of the lo cal parameter passing mo des for remote invo cation has been

questioned since the intro duction of Remote Pro cedure Call The reason is eciency

an imp ortant consideration in a distributed system where communication can b ecome a

serious b ottleneck But then eciency has always b een scrutinized for lo cal parameter

by passing as well Programmers and language designers have often taken to using

reference where byvalue would have b een adequate just for eciency reasons This

misuse however works only if the parameter is not mo died inadvertently either by the

receiver or through aliasing The language should give the programmer the choice to pass

ob jects either by value or by reference

A conceptually attractive technique for implementing passbyvalue is copyonwrite

just a reference is passed and a copy will be generated on demand ie if and when

the ob ject is mo died Unfortunately ecient implementation of this techique requires

virtual memory supp ort unless writing is excluded a priori ie the language treats

value parameters as readonly and prevents the actual parameter from b eing mo died

through aliasing

Remember that passbyreference is a typical device for lowlevel imp erative pro

gramming as oppp osed to eg functional programming and should b e shunned wherever

p ossible We should pass an ob ject by reference only if the very job of the receiver is to

mo dify the ob ject or to sense changes applied to the ob ject by concurrent activities or to

serve as a container for ob ject references For a counterexample consider array ob jects

An array is usually part of the data representation of a larger ob ject If it is passed as

a parameter of an ob ject invo cation it is usually passed by value Passing by reference

is conned to lo cal auxiliary pro cedures of the ob ject This reasoning is all the more

valid for small recordlike ob jects and for variables of primitive typ es which in Java

are passed by value anyway

Distributed implementation

What do es this imply for remote invo cation To b egin with the eciency prop erties of

passbyvalue and passbyreference are inverted Thus restrictions on passbyreference

as suggested ab o ve come in handy Let us make sure that no trouble sp ots remain

The ob ject in question is a typical server ob ject heavyweight andor immo

bile andor infrequently invoked Passbyreference is alright here If the ob

ject is mobile and small a migrational variant such as passbyvisitpassbymove

Black Achauer may exhibit b etter p erformance dep ending on the access

pattern

For arrays passbyreference must not be allowed in remote invo cations If this

ewould get an error message eg from the stub compiler restriction is not ob eyed w

when generating the distributed version As we have seen however the restriction

is not a serious one and barely compromises access transparency

For all other ob jects eg recordlike data ob jects passing by reference p oses no

conceptual problems It may not b e ecient but as argued ab ove will o ccur only

rarely This would even justify not supp orting remote access to public attributes

again pro ducing error messages

In order to prepare the ground for designing an appropriate mo dication to Java we

have to have a closer lo ok at passbyvalue for Java ob jects

Passing ob jects by value

Determining the parameter passing mo de

One of the questions connected with passing ob jects byvalue is how the parameter pass

ing mo de for argumen ts of a particular op eration is determined The approach taken

by RMI is to intro duce new interfaces like Remote and Serializable and select the pa

rameter passing mechanism for a particular argument according to this arguments typ e

if it implements Serializable it is passed by value A similar approach is taken in

Netscap e one of the four submissions to the OMGs ob jectbyvalue RFP OMG

Here the notion of stateful interfaces is intro duced ie interface typ es that dene an ex

plicit representation for the state of an instance Instances of stateful interfaces are always

passed by value

However this approach even if it is applied consistently is inappropriate On

the one hand it is not exible enough as we cannot pass an ob ject x by reference to an

op eration f and by value to another op eration g which might be desirable eg if

f is designed to make a single up date to a data structure emb edded in x and g needs

to access this same data structure frequently eg by traversing it times without

altering it

On the other hand it is problematic that an op eration cannot rely on the passing

semantics used for its arguments as these may not be known at compile time A for

mal parameter typ e T might have two subtyp es T and T one of which implements

Serializable while the other do es not The actual argument supplied to the metho d

invo cation mightbeofany of the typ es T T and T as Java allows p olymorphism This

is not a problem in RMI b ecause RMI as shown ab ove uses serialization for remote calls

only The RMI stub generator complains if a remote interface extends another non

remote interface Thus it keeps the inheritance hierarchies for remote and nonremote

typ es separate so that the ab ove problem caused by p olymorphism cannot arise For a

more general approach though this remains a problem

The underlying misconception of the whole approach is that the mechanism for passing

a parameter to an op eration is selected on the basis of the typ e of the actual argument

so the semantics of the passing mo de is regarded a prop erty of this type while in fact it is

part of the operation semantics Consequently it should be expressed in the declaration

of the op erations formal parameters and its return typ e as already argued for the lo cal

case in

A new keyword for sp ecifying passbyvalue

We prop ose the intro duction of a new keyword copy which sp ecies that a parameter is

to b e passed by value Consider the following metho d declaration

public copy int aMethod copy A a B b int i

The parameter passing mo des for myMethod are as follows Both the return value of

reference typ e int and the formal parameter a of reference typ e A are declared using

copy this means that the corresp onding ob jects are passed by value Using copy do es

not change the fact that both the formal and the actual parameter are references The

only eect is that eg a refers to a newly created copy of the actual parameter Of the

remaining parameters b is passed by reference and i is passed by value

It is imp ortant to remember that according to omitting the rst copy would

cause an error message to be pro duced during the generation of the distributed version

Omitting the second copy would pro duce an error message only if A had public attributes

and the system would not supp ort remote attribute access

The semantics of copy is similar to the semantics of expanded in the Eiel language

Meyer In Eiel expanded is used to declare variable names that denote ob jects not

references to ob jects It can also b e used to sp ecify passbyvalue semantics for arguments

the and return values of reference typ es However if a parameter is passed expanded

receiver will get a copy rather than a reference to a copy

Serializing ob ject state

To supp ort ob ject copying in a homogenous environment a single implicit serializa

tiondeserialization proto col realized by the distribution platform could be used RMI

eg provides such a mechanism This proto col would sp ecify what parts of the ob ject

are to be copied in which order Such an implicit proto col would not however suce in

heterogeneous environments such as CORBA where interop erability between implemen

tations and platforms is a ma jor concern In order to realize ob jectbyvalue semantics

in such a setting the proto col used for serializationdeserialization of ob jects needs to

be explicitly dened Alternatively a negotiation mechanism for selecting a serialization

proto col could be sp ecied In the following we will assume some reasonable default

proto col

Copies made as a consequence of passing an ob ject by value are shallow copies by

default ie ob jects contained as references within the ob ject are not copied instead

those references are automatically replaced by network references If this behavior is

not appropriate for a particular situation the class of the ob ject that is passed must

inherit from a sp ecial interface and the programmer needs to provide the sp ecic copying

mechanism

We assume that an implementation for the appropriate ob ject typ e a Java class is

available in the receiving context or can be made available by downloading Java co de if

necessary But which is the appropriate typ e In order to supp ort p olymorphism prop erly

the receiving context needs to be given an ob ject of the most derived typ e of the actual

argument supplied in the invo cation

Conclusion

Wehave shown that Java although explicitly designed with the network in mind exhibits

weaknesses in an imp ortant networkrelated issue and dees access transparency more

ercely than traditional languages A number of approaches to solving the parameter

passing problem have been presented emphasizing the need for a solution that avoids

distributionrelated featurism

A mo dest mo dication of Java has b een prop osed a new keyword copy for sp ecifying

passbyvalue semantics for reference typ e parameters It has b een argued that carefully

cho osing the semantically appropriate parameter mo des makes a program t for both

centralized and distributed execution Not surprisingly the resulting mechanisms are in

line with CORBAs mo del of parameter passing An interesting p ersp ective would be to

evaluate Javaas a starting p oint for designing a CORBA implementation language

References

Achauer Bruno Achauer

The DOWL distributed ob jectoriented language Communications of the ACM

September

Bal Henri Bal Frans Kaasho ek Andrew Tanenbaum Orca a language for parallel

programming of distributed systems IEEE Trans on Software Engineering

Barnes John Barnes Ada Rationale The Language The Standard Libraries

Springer

httpwwwadahomecomRe sour ces refs rat html

Black Andrew Black Norman Hutchinson Eric Jul Henry Levy Larry Carter Dis

tribution and Abstract Typ es in Emerald IEEE Transactions on Software Engineer

ing January

Brose Gerald Brose JacORB Design and implementation of a Java ORB Proc

Int Conf on Distributed Applications and Interoperable Systems DAIS Cottbus

Germany September Chapman Hall

httpwwwinffuberlin de bros eja corb

Cardelli Luca Cardelli A language with distributed scop e Computing Systems

January

httpwwwresearchdigit alc omS RCp erso nal Luc a CardelliObliqObliqhtml

Gosling James Gosling Bill Joy Guy Steele The Java Language Specication

AddisonWesley

httpjavasuncom

Kiczales Gregor Kiczales Beyond the black box op en implementation IEEE Soft

ware January

Lea Doug Lea Design for op en systems in Java Proc Int Conf on Coordination

Languages and Models LNCS Springer

Meyer Bertrand Meyer ObjectOriented Software Construction nd ed Prentice

Hall

Netscap e Netscap e Novell Visigenic Objects By Value Joint Initial Submission

OMG TC do cument orb os

ftpftpomgorgpubdoc sor bos ps

OMG OMG The Common Object Request Broker Architecture and Specication

Revision July

httpwwwomgorg

OMG OMG ObjectsbyValue RFP

OMG do cument orb os June

ftpftpomgorgpubdoc sor bos ps

Philippsen Michael Philippsen Matthias Zenger JavaParty transparent remote

ob jects in Java Proc ACM PPoPP Workshop on Java for Science and Engineering

Computation Las Vegas June

httpwwwipdiraukade phl ipp part ygs gz

Sun Sun Microsystems Java Remote Method Invocation Specication Revision

F ebruary Sun Microsystems

httpjavasuncomprodu cts jdk docsguidermispec

rmiTOCdochtml

Waldo Jim Waldo Geo Wyant Ann Wollrath Sam Kendall A note on distributed

computing Sun Microsystems Technical Rep ort TR November

httpwwwsunlabscomte chni cal repo rts ab stra ct h tml