Programming Languages for Mobile Code TOMMY THORN Inria/Irisa

Sun’s announcement of the Java more than anything popularized the notion of mobile code, that is, programs traveling on a heterogeneous network and automatically executing upon arrival at the destination. We describe several classes of mobile code and extract their common characteristics, where security proves to be one of the major concerns. With these characteristics as reference points, we examine six representative languages proposed for mobile code. The conclusion of this study leads to our recommendations for future work, illustrated by examples of ongoing research.

Categories and Subject Descriptors: A.1 [General Literature]: Introduction and Survey; D.3.2 [Programming Languages]: Language Classifications—object- oriented languages; C.2.4 [Computer-Communication Networks]: Distributed Systems—distributed applications; D.4.6 [Operating Systems]: Security and Protection—access controls General Terms: Languages, Security Additional Key Words and Phrases: Distribution, formal methods, Java, Limbo, mobile code, network programming, Objective Caml, Obliq, object orientation, portability, Safe-Tcl, safety, security, Telescript

1. WHY MOBILE CODE? uated) as any other document [Rouaix 1996]. The expression “mobile code” has vari- ous different meanings in the literature. —Mobile agents are code-containing ob- Just to take three examples, let us cite jects that may be transmitted be- the following. tween communicating participants in a distributed system [Knabe 1996]. —The term mobile code describes any In this survey we refer to mobile code program that can be shipped un- as that travels on a heteroge- changed to a heterogeneous collection neous network, crossing protection do- of processors and executed with iden- mains, and automatically executed upon tical semantics on each processor arrival at the destination. Protection [Adl-Tabatabai et al. 1996]. domains can be as big as a corporate —Mobile code is an approach where pro- network and as small as a personal grams are considered as documents, hand-held digital assistant. We believe and should therefore be accessible, that this characterization is general transmitted, and displayed (i.e., eval- enough to encompass most usages and

This work was performed under the auspices of Dyade (an R&D joint venture between Bull and Inria). Author’s address: T. Thorn, Inria/Irisa, Campus de Beaulieu, 35042 Rennes Cedex, France; email: ͗[email protected]͘. Permission to make digital/hard copy of part or all of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. © 1997 ACM 0360-0300/97/0300–0213 $03.50

ACM Computing Surveys, Vol. 29, No. 3, September 1997 214 • Tommy Thorn precise enough to exhibit the distinctive side effect. This example is a good features of this technique. For example, illustration of some of the benefits it excludes the cases where code is that motivate mobile code: the algo- loaded from a shared disk or down- rithmic description of a complex im- loaded (manually) from the Web. age can be made very compact and Mobile code supports a flexible form general, independent of printer spe- of distributed systems where the desired cifics such as resolution and number nonlocal computations need not be known of available colors. Also, PostScript in advance at the execution site. The ad- printers off-load some of the work vantages of this model are many, includ- ing: burden from the printing host. This compactness, expressiveness, and de- —Efficiency. When repeated interac- vice independence have made Post- tions with a remote site are needed, it Script a de facto standard today. can be more effective to send the com- —Database technology is another area putation to the remote site and to where a form of mobile code has long interact locally. This is especially the case when the latency of the network been used advantageously. The size of is high and the interactions consist of a typical database makes it infeasible many small messages. to transmit it in entirety to a client; —Simplicity and flexibility. The main- thus any database operation must be tenance of a network can be much communicated to and performed by simpler when the applications are lo- the database server. Today, most cated on a server and clients them- commercial databases support the selves install them on their sites on ANSI standard query language SQL demand. Installing new or updated for database access. SQL offers a com- software becomes independent of the pact notation for expressing complex nature and number of clients. In some operations on multiple database rela- cases, it is even impossible to know in tions. advance all the pieces of code that —Documents with embedded executable will be needed at a given site. contents transmitted on the network —Storage. Loading code on demand are another kind of mobile code. Mul- rather than having all programs du- timedia documents in the Andrew plicated on all sites can significantly System [Hansen 1990] can include ex- reduce the total storage requirement. ecutable scripts written in an object- We start with a short review of some oriented script language. These en- of the most well-known examples of ap- riched documents can be sent as plications using mobile code. electronic mail or posted as news arti- cles. The scripts are executed under —PostScript௢ is a page-description lan- user control and can present interac- guage designed by Adobe Systems. tive dialogues and use graphical facil- PostScript is remarkable in that it is ities. A similar extension for the In- also a stack-based programming lan- ternet multimedia standard MIME guage and is associated with a large has been proposed. This extension en- standard library suitable for page ables the use of embedded executable rendering. Printing on a PostScript content written in Safe-Tcl [Boren- printer consists of composing a pro- stein 1994], a restricted variant of the gram describing the pages to be script language Tcl. printed and sending this program to The usefulness of mobile code has also the printer. The printer then executes been realized for the World Wide this program and prints pages as a Web. One of the problems with the Web is that interactive pages are im- ௢ PostScript is a registered trademark of Adobe practical in general due to network Systems Incorporated. latencies. Even when latencies are

ACM Computing Surveys, Vol. 29, No. 3, September 1997 Languages for Mobile Code • 215

not a problem, the content and ap- bile code. Section 5 draws the lessons of pearance of Web pages are con- this study and compares the languages strained by what is expressible with studied. We conclude the article by giv- the HTML language. Allowing embed- ing our perspective on the current state ded executable contents removes the of mobile code programming languages constraints of HTML and network la- and point out directions for future re- tency. Among the many existing pro- search. posals, the most well known is proba- bly Java by Sun Microsystems. 2. PROGRAMMING LANGUAGE —Finally, a fourth class of applications CONCERNS of mobile code addresses the problem of software distribution and installa- From the application classes of mobile tion, traditionally known to be costly code listed in the introduction, we can and difficult. Lucent Technologies has extract a number of common needs in developed Inferno [Lucent 1996], a terms of programming languages. mobile-code-enabled network operat- —The need for portability. Inherent in ing system aimed mostly at media the idea of mobile code is the notion of providers and telecommunication heterogeneous execution sites. It is companies. Customer’s decoder-boxes not possible to have a specific version or portable telephones running In- of the code for every possible architec- ferno can be extended dynamically ture, thus the need for portability. with software in response to their re- The portability issue also has an im- quests. In the same spirit, but in a pact on the kind of services the appli- different context, Sun has proposed cation can expect from the executing the use of Java-capable network com- host, for example, trying to write to a puters to replace workstations in cor- file might not be meaningful on a porate networks. The benefit here is hand-held telephone. the low cost of maintaining an Intra- net (a local network using —The need for safety. We use the word technologies) of network computers “safety” here in the sense that a bug that download all applications on de- in a safe application should not affect mand from an application server. the execution of other independent parts of the environment. Limited as- It should be noted that there are sub- sumptions can be made on code im- tle differences in the usage of mobile ported from an unknown source, code in the preceding four applications; which means that it cannot be trusted in the PostScript and the database mod- a priori to be safe, and special protec- els, it is the sender of mobile code that tion must be provided. Notice that takes the initiative of the communica- this definition addresses the safety tion, whereas in the Java and Lucent from an point of models, the execution site takes the ini- view. Safety can also be imposed by tiative to load the mobile code. We come appropriate restrictions at the pro- back to this distinction when we present gramming-language level. For exam- programming languages for mobile ple, free access to memory can be code. eliminated through a disciplined In this survey we first identify the pointer model and systematic check- special concerns for mobile code and ing of the bounds for array accesses. their impact on programming lan- —The need for security. As the mobile guages. In Section 3 we focus on the two code crosses protection domains, spe- most important issues: safety and secu- cial care must be taken to protect rity. With this background, we examine, against security threats presented by in Section 4, six representative lan- mobile applications. The boundary be- guages that have been proposed for mo- tween safety and security concerns is

ACM Computing Surveys, Vol. 29, No. 3, September 1997 216 • Tommy Thorn

not always clearcut. Safety is mostly levels at which to address these issues: concerned with the behavior of sys- the communication level, the operating- tems in the presence of bugs, but as a system level, the abstract-machine lack of safety can be exploited for level, and the programming-language security breaches, safety becomes a level. necessary (but not sufficient) prereq- uisite for security. 3.1 The Communication Level We distinguish between four security properties [Russell and Gangemi In a top-level view of a computer sys- 1991]. tem, we consider a collection of comput- ers connected with some networking Confidentiality, also known as secrecy: technology. Safety concerns here re- it concerns the absence of leakage of quire a robust protocol implementation private information (which often oc- that can withstand a faulty or malicious curs through a covert channel, i.e., a communication partner. channel that is not explicitly intended For networks like the Internet, where for communication). data can pass through untrusted inter- Integrity, also known as accuracy: pri- mediate hosts, the communication itself vate data should not be modifiable by cannot be assumed to be secret or au- unauthorized parties. thentic. Therefore secure protocols Availability, the negation of which is based on cryptographic techniques are denial of service: the attacker denies employed to guarantee confidentiality, normal use of shared resources, for integrity, and authentication, even on example, by overloading them. such networks. This is included in the Authenticity guarantees that the iden- new Internet protocol, IPv6 [Huitema tity of a communication partner can 1996], but many proposals are also lay- be trusted. ered on the top of existing protocols. Secure HTTP [Rescorla and Schiffman —The need for efficiency. Efficiency is 1996] and the Secure Socket Layer almost always an issue for program- [Freier et al. 1996] are two examples. ming languages and their implemen- Availability is also an issue, but dealing tations. The special need here is for a with it is very difficult. This difficulty is minimal overhead for the measures illustrated by the many denial of service taken to ensure portability, safety, attacks on the Internet (see, e.g., Fox and security. [1996]). Portability and efficiency are issues that have been studied in the program- 3.2 The Operating-System Level ming language community for quite a long while. The safety and security is- Safety and security at the communica- sues are well known in the context of tion level is not sufficient in general. operating systems, but safety and, espe- Handling safety and security is also a cially, security are issues that have not primary concern at the operating-sys- received enough attention so far in the tem level. area of programming languages. Secu- Safety is generally ensured through rity and safety problems take a new the use of hardware memory protection. dimension in the context of mobile code. This isolates a from the rest of For this reason, we focus on these is- the system, leaving operating-system sues in the next section. calls as the only accessible interface. As no assumptions need to be made on the 3. SAFETY AND SECURITY ISSUES nature of the process, this leaves a great degree of freedom to the implementa- Safety and security concern many as- tion, for example, to choose any pro- pects of a system. We distinguish four gramming language available. For mo-

ACM Computing Surveys, Vol. 29, No. 3, September 1997 Languages for Mobile Code • 217 bile code, it has the problem of being tabai et al. 1996] the overhead of inter- very dependent on the operating system pretation is eliminated through soft- and the hardware, and thus not porta- ware fault isolation (SFI). Code for the ble. Even when using memory protec- Omniware abstract machine is trans- tion is possible, it may not always be lated almost directly into native ma- desirable. chine code, but all memory accesses are translated to code that checks for ac- —Memory protection means that all cesses outside a given boundary. communications have to cross a pro- The self-certified code (SCC) [Necula tection boundary, for example, using and Lee 1996] technique goes even fur- a system call mechanism, and this ther and eliminates the overhead of the can be expensive. protection as well. Self-certified code is —Many smaller systems, for example, the pair of machine object code and ma- personal digital assistants (PDAs), do chine-checkable formal proof. The proof not have the hardware needed. demonstrates that the object code re- —Requiring the use of memory protec- spects the execution site’s published tion makes embedding the mobile (low-level) safety policy. This policy code environment in another applica- comprises a set of proof-formation rules, tion much more complicated and often along with a set of preconditions. The impossible. correctness proof can easily be verified automatically and ensures that the code Confidentiality and integrity can be respects the (low-level) safety and can achieved by controlling processes’ access therefore run without run-time checks. to information and communication channels. Complete control of covert 3.4 The Programming-Language Level channels is very hard and rarely at- tempted in nonclassified systems. A Another way to obtain the required form of availability can be attained by safety is to sacrifice the language inde- using limits on resources, such as disk pendence and use programs written in a space, number of processes, and mem- safe programming language. Most mod- ory usage, and using preemptive sched- ern programming languages guarantee uling and timeout in locks. Authentica- against low-level errors through mecha- nisms such as typing, restricted point- tion is usually established through an 1 initial identification of the user (e.g., ers, automatic memory management, using a password scheme) and main- and array bounds checking. It is possi- tained by data structures of the operat- ble to go even further and use the lan- ing system, protected from tampering guage and access rules to protect from user-level processes. the interface of resources. The gain here is that the security implementation, such as resource management and con- 3.3 The Abstract-Machine Level trol, can be written in the source lan- The safety guarantees obtained through guage and used as a library. the use of hardware memory protection As an optimization, the high-level can also be realized using an abstract program can be compiled and type- machine. Using a language-independent checked before being shipped as mobile abstract machine retains all the lan- code. The question then arises as to how guage independence of the operating to make sure that the object code is system solution, but does not have the really a nontampered output of a correct portability problems. In the simplest version, the protection boundaries are 1 Restricted pointers are like references known enforced by an , performing from, for example, Standard ML. The only valid all the needed checks at run-time. operations on a pointer variable are dereferencing In the Omniware model [Adl-Taba- and assignment.

ACM Computing Surveys, Vol. 29, No. 3, September 1997 218 • Tommy Thorn . Three techniques have been Java, Limbo, Objective Caml, and Obliq, proposed. are general-purpose languages, in- tended for general application develop- —Using cryptographic signatures to re- ment. The last two, Safe-Tcl and Tele- duce the problem to one of trusting script, are special-purpose languages. the author. As we already trust major We expect Java to be the best known software producers enough to run language amongst these, and therefore their applications, this can be seen as give it a more detailed treatment and a continuation of current practice. use it as the reference point for the —Using cryptographic signatures to other languages. trust . The idea is to ship the source to one of a small number of trusted compilation sites for compila- 4.1 Java tion and certification. Java is a class-based object-oriented —Compiling to an intermediate lan- language created by Sun Microsystems, guage that can be (type) checked to with an emphasis on portability and verify the same constraints as are im- security [Arnold and Gosling 1996; Gos- posed on the source language. The ling et al. 1996]. As an example of how success of this approach depends on to use Java for mobile code, JavaSoft the intermediate language being suit- has created the applet model. Applets able for efficient verification and per- are small applications that are auto- mitting an efficient abstract machine. matically downloaded and executed These techniques are not exclusive. upon visiting a Web page containing For example, combining the first two them. seems easily feasible as they require much of the same technology and infra- The Language structure. Likewise, the abstract ma- chine can include operating-system as- For clarity, we distinguish three levels pects and can be more or less language- in the presentation of Java: the lan- dependent. For instance, depending on guage level, the abstract-machine level, the context, we can consider the system and the library level. libraries either as part of the abstract Language level. The language is machine or as part of the language. The based on a simplified variant of Cϩϩ languages examined in the following all with all unsafe and most complicated use combinations of these levels, al- language features removed. The fea- though this is generally not made ex- tures that have been removed include plicit. unsafe operations such as pointer arith- metic, unrestricted casts, unions, and 4. PROGRAMMING LANGUAGES FOR features leading to unmaintainable pro- MOBILE CODE grams such as the C preprocessor, un- structured gotos, operator overloading, We focus here on a list of representative and multiple inheritance. Automatic languages for mobile code. Space consid- memory management has been added, erations prevent us from presenting all guaranteeing against pointer errors due the relevant languages. Among the to manual memory management and other programming languages for mo- making usage of dynamic memory much bile code, let us mention JavaScript simpler. Array and string types are [Netscape 1997] and VisualBasic. The built in with range check of all accesses. interested reader is referred to the bib- Exception handling has also been liography for a more extensive overview added, favoring the creation of robust of the field. programs. Finally, for concurrency, The first four languages studied, Java provides threads and serialized

ACM Computing Surveys, Vol. 29, No. 3, September 1997 Languages for Mobile Code • 219 methods, using a mutex-locking on the implement dynamic checks to guarantee corresponding object. the safety of the language. These are Java includes a novel notion of inter- bounds checking on array and string face types. Interfaces define a collection accesses, checking casts to a more spe- of abstract methods and constants with cific type, invoking methods on null their associated types. A class can be pointers, and the like. declared to implement an interface, in Library level. Complementing the which case it must implement all the language, a library provides general- abstract methods of the interface. Any- purpose data structures, support for where a value of an interface type is graphical user interfaces, and access to expected, a value of a class implement- network communication. Applications ing this interface can be used. Inter- written using these libraries run un- faces are useful for a number of pur- changed on a wide range of platforms, poses: they can be used to hide the including Unix, Windows, and Macin- implementation of a class and to group tosh. classes with common functionality with- out forcing them into a class hierarchy. Security Java also uses a notion of package.A package groups a number of class and The Java language, as described, is a interface definitions. Unlike most mod- modern “safe” language, guaranteeing ule systems, Java packages are open- that type and access rules are always ended and can be extended with defini- respected. This in turn enables a low- tions not envisioned by their original level security policy to be expressed creator. within the language itself. The visibility The default visibility of class and at- rules for classes and attributes play a tribute definitions can be changed with crucial role in this respect. Indeed, the a visibility modifier keyword. A class interface to local resources is provided can be declared final, preventing deri- by libraries, protected by the scope and vation of subclasses of itself, abstract, visibility rules. Most resources requir- preventing creation of instances, and ing dynamic access control, such as the private, limiting the scope of the class file system or access to the network, are declaration to the containing package. controlled by a centralized security Attributes have four (ordered) levels of monitor, called the SecurityManager. visibility: private, default, protected, The SecurityManager has an abstract and public. Private attributes are only type that cannot be instantiated by an visible from within the object itself, that applet. All security-related methods are is, not in objects of a subclass or other declared final, so that applications and objects of the same class. The default applets are forced to use the appropri- visibility extends visibility of the at- ate code. Without this protection, mali- tribute to the package in which it is cious applets could redefine the method defined. Protected attributes further ex- in a subclass, potentially circumventing tend the visibility to subclasses of the the security invariants. A final class defining class, potentially defined in an- enjoys even stronger protection, in that other package. Finally, public attributes the inability to create subclasses also are visible everywhere. implies the inability to define new methods with access to protected at- Abstract machine level. The Java tributes. Virtual Machine (JVM) is a language- dependent abstract machine that is Linking close enough to Java that its object code can be checked to respect the language The loading of classes over the network semantics. In addition to these static is done by an object of the class Class- (load-time) verifications, the JVM must Loader. This object is created during

ACM Computing Surveys, Vol. 29, No. 3, September 1997 220 • Tommy Thorn startup and cannot be replaced by ap- animations, games, demonstrations, plets afterwards (it is part of the Secu- and interactive multimedia educational rityManager state). As attributes with a programs; but major software houses default or protected visibility are fully have already demonstrated complete of- accessible within the package of their fice application suites written as ap- definition, these two visibilities would plets. be of little use if applets could introduce Moving beyond applets, let us men- classes freely in any package and thus tion some of the more substantial offer- avoid the intended protection. To pre- ings (some are currently under develop- vent this, the ClassLoader protects a ment) [JavaSoft 1996]: fixed set of packages from being ex- tended by applets. The exact set is not —JavaOS, a complete operating system specified, but includes java and sun. The written in Java, offering portability ClassLoader also maintains a unique and extensibility; name space for each network source, —Jeeves, a framework for extendible separate from the name space for network servers; classes coming from the local file sys- —Java Management API, a framework tem. Network sources are currently dis- for management of heterogeneous tinguished based only on their symbolic networks; address. —Java Electronic Commerce Frame- Classes can be loaded from the local work, a software point-of-sale termi- file system if they are present in a direc- nal accessible by any Java-enabled tory specified in the CLASSPATH vari- browser; able. This variable is part of the Java environment configuration and can be —Java Beans, a component architecture changed by the user before launching for reusable software components; the network browser or Java client, but —Java Database Access API, a uniform cannot be accessed or modified by an interface to relational databases; and applet. —Java RMI, an API for implementing As class files loaded through the net- remote method invocation. This will work cannot be trusted to be untam- ease the creation of client-server ap- pered with and the abstract machine plications and permit the creation of runs with few type checks, the bytecode more traditional distributed systems. is passed through a bytecode verifier that checks that the object code respects Reflections the Java semantics: it ensures that the bytecode is in a valid format, that point- Java is a promising language with a ers are not forged, that access rules are tremendous market acceptance. Much of enforced, that the operand stack is used this popularity stems from its unique consistently with respect to the types, combination of characteristics: close to and that the parameters passed have Cϩϩ, safe, portable, and concurrent, as the expected types. well as supplying a rich base library. Since the first presention of Java, a Applications number of “safety” bugs have been dis- covered [Dean et al. 1996]. It is of con- There is no single well-identified appli- cern that many of the sources of the cation area for the language: any dis- bugs can be attributed to the vague tributed and portable application can nature of the definition of Java. Al- take advantage of Java. Sun and Oracle though the core language seems simple, emphasize Java’s capabilities as a gen- many details are in fact quite subtle. eral-purpose language for corporate In- For example, in Java the integrity of tranets of disk-less network computers. the security depends upon applets not Most commonly encountered applets are being able to instantiate subclasses of

ACM Computing Surveys, Vol. 29, No. 3, September 1997 Languages for Mobile Code • 221 critical classes, like ClassLoader. This telecommunication, are converging. In- condition is checked at run-time by the ferno [Lucent Technologies 1996] by Lu- constructor (a special method devoted to cent Technologies (Bell Labs Innova- initializing the object upon construc- tions) is a network operating system tion), which throws an exception in case designed to suit the constraints and of violation. If the applet can catch this needs of this environment. Inferno is exception within the constructor, it has intended to be flexible enough to be succeeded the instantiation, although employed on devices as diverse as intel- the object will only be partially initial- ligent telephones, hand-held computers, ized. The subtle restriction imposed on personal digital assistants, television the constructor to avoid these situations set-top boxes,2 home video game con- was checked by the compiler, but not soles, and inexpensive network comput- enforced by the bytecode verifier in an ers. It can also be used on servers such early version of Java [Dean et al. 1996]. as Internet servers, financial servers, Java’s current security implementa- and video-on-demand servers. tion can only be seen as a first step, as The design of Inferno is based largely it has a number of shortcomings. For on Plan 9, a network operating system, example, as noted in Billon [1996], it also from Lucent Technologies, which currently does not scale beyond simple emphasizes portability, versatility, and applets. Many of the prospective appli- an “economical” implementation. Eco- cations for Java, such as the ones men- nomical here refers to the computing tioned in the following, require addi- resources required; Inferno can run tional local libraries. Unfortunately, within little memory and does not re- there is currently no way to protect quire virtual memory hardware. Porta- user-defined libraries from redefinitions bility has two dimensions in Inferno. and extensions from applets. Only the The operating system itself can run on system-defined fixed set of packages is the bare hardware, or on top of an exist- protected. The fact that packages in ing operating system such as Unix, Java are always extensible makes it Windows-NT, or Plan 9. In the latter impossible to guarantee the security of case, the services provided by Inferno a package based on its source alone; the are interfaced to the native services of semantics of the ClassLoader must be the underlying operating system. The taken into account as well. This seems second dimension is the portability of against the spirit of Java’s language- Inferno applications. Applications are based security, and can be a serious written in Limbo, an integral part of problem considering that the Class- Inferno, and are compiled to a binary Loaders of the major Java applet envi- format that is portable between all In- ronments available today do not have ferno implementations. Inferno provides identical semantics [Billon 1996]. a unified file system interface to operat- We strongly believe that a formal ap- ing system services, which hides the proach to security in Java could help fact that the service can be local or avoid most of these weaknesses and remote. would result in a much cleaner and coherent design. Work on formalizing The Language Java is underway, and progress has been made recently on formal studies of As for Java, it is useful to distinguish Java’s [Drossopoulou and between three levels of Limbo: the lan- Eisenbach 1996]. guage level, the abstract machine level, and the library level.

4.2 Limbo 2 A set-top box is the consumer receiver and de- The technologies of three major indus- coder for the (usually scrambled) television sig- tries, entertainment, computing, and nals distributed typically by satellite or cable.

ACM Computing Surveys, Vol. 29, No. 3, September 1997 222 • Tommy Thorn

Language level. Limbo is a “safe” im- so cyclic data are treated in a specific perative language. Its main inspiration way and are eventually reclaimed. is C, but it includes in addition declara- Abstract machine level. Limbo pro- tions as in Pascal, abstract data types, grams are compiled to a RISC-like ab- first-class modules, first-class channels, stract machine called Dis, designed for automatic memory management, and just-in-time compilation to efficient na- preemptive scheduled threads. It ex- tive code by the Limbo run-time system. cludes pointer arithmetic and casts. Dis does not impose language con- Abstract data types (ADT) declare ob- straints; for example, Dis code need not jects with variable, constant, and func- be type checkable. Lucent claims that tion fields. There is no notion of inheri- Dis is well suited for real microproces- tance for ADTs and no visibility sors and reports excellent performance declaration for ADT members. Recur- of their implementation. sive structures are subject to a few sim- ple syntactic constraints to guarantee Library level. Limbo provides a rich that cyclic data cannot be created (re- library of standard modules, including cursive fields cannot be assigned). modules for network communication, se- Fields declared cyclic do not suffer from cure and encrypted communication, and this constraint. The reason for this par- graphics. ticularity is the way Limbo’s garbage To reflect the different uses of In- collection handles cycles (see the follow- ferno, two libraries are ing). available. One, based on Tk [Ousterhout The declaration of a module identifies 1994], is intended for “traditional” win- the types of exported functions and con- dow-based user interfaces. The other tains the exported declarations of ADTs, provides ready-made interface compo- simple type declarations, and constants. nents for typical embedded applications, In order to use a module, it must be such as interactive TV. The specialized instantiated by loading an implementa- design allows for a minimal memory tion of the module (at run-time). The requirement. loading is done with the built-in func- tion load that takes a module type and a Security path to the module implementation and Safety in Inferno is achieved through a returns the instantiated module (or null safe language with restricted pointers if unsuccessful). This allows the pro- and automatic memory management. gram to choose among several imple- Pointers can point to any object but mentations of a given module at run- cannot point to inside arrays, and there time. is no pointer arithmetic or arbitrary The channels of Limbo allow the com- pointer . This safety is munication of any value of its declared not enforced by the abstract machine, type. A channel can be connected di- though. Instead, Inferno relies on appli- rectly to another process or, through a cations being signed by trusted authori- library call, to a named destination. ties who guarantee their validity and Channels are the only built-in primi- behavior. tives for interprocess communication, Security management in Inferno is but more complicated mechanisms can inspired by the Plan 9 operating sys- be built upon them. tems; all resources are accessed as files, Limbo’s garbage collection is based on including data, network communication reference counting and reclaims the channels, and the executable modules memory of noncyclic data immediately, that constitute the applications. All re- once the last reference to them disap- sources available to an application ap- pears. Reference counting has the limi- pear exclusively in the name space of tation that it cannot reclaim cyclic data, that application. Applications cannot

ACM Computing Surveys, Vol. 29, No. 3, September 1997 Languages for Mobile Code • 223 arbitrarily manipulate this name space new content decoders, or change the themselves, but must, for security sen- handling of link activation. Applets can sitive resources, call the modules that access parts of the browser internals, provide them. such as the cache and browser naviga- tion values. Linking The built-in support for dynamic linking The Language of modules provides type-safe linking at O’Caml includes imperative features, the user level. Another (type-safe) way such as references and assignment, and to provide a module is to transmit it on a class-based object system, all inte- a channel of the appropriate type. Cur- grated within a functional core. The rently, certain data types, such as mod- main characteristics of O’Caml and its ules, cannot be exported outside a ma- implementation are: chine [Pike 1997]. —Strong, static polymorphic typ- Applications ing. The static typing property en- sures that “well-typed programs can- The application domain for Inferno is not go wrong”; that is, they cannot focused towards applications for service terminate with a type error. All type providers. In such environments, only a errors are caught during compilation. few, usually fixed, sets of authorities Primitives’ errors, like division by need to be trusted, which justifies the zero, are not considered type errors, use of cryptographic signatures. but are handled through the excep- tion mechanism. Reflections Polymorphic typing allows types to be Lucent’s use of an operating system ba- parameterized over a number of type sis provides a clear separation of re- variables. This makes possible a type- sponsibilities among the language, the safe construction of general functions. abstract operating system, and the run- For example, a function calculating time environment. This separation re- the length of a list is not dependent duces the complexity of verifying secu- on the type of the list elements. With rity consistency and eases the isolation polymorphic typing, it can be defined of security breaches. The module system once (with type ␣ list 3 int, where ␣ is of Limbo enables a clean and type-safe a type variable) and then used for way to implement dynamic loading. every kind of list. O’Caml offers automatic type recon- struction, as is usual with languages 4.3 Objective Caml in the ML family. For documentation Objective Caml (O’Caml) [Leroy 1997] is and debugging purposes, it is often a functional language in the ML tradi- useful to manually annotate key func- tion, originating from Caml, a language tions with their type. developed at Inria that is widely used in —A powerful module system. O’Caml education. O’Caml has been used as a offers a rich higher-order module sys- language for mobile code in the develop- tem in which modules have signa- ment of the MMM [Rouaix 1996b, tures, providing the names and types 1996a] Web browser, also developed at of exported elements. O’Caml permits Inria. MMM adds the possibility of dy- higher-order modules through the no- namically linking and executing O’Caml tion of functors. A functor can be in- applets accessed through the Web. stantiated by applying it to modules MMM provides a number of hooks for with the same signatures as its for- the applets; for example, applets can mal arguments. The unit for separate add elements to the user menu, include compilation is the file, which implic-

ACM Computing Surveys, Vol. 29, No. 3, September 1997 224 • Tommy Thorn

itly defines a module of the same The capabilities of an applet are repre- name. sented by an abstract with —First-class (higher-order) functions. one function to access the encapsulated Functions can be passed to other capabilities and another one to ask the functions or returned as results. They user for extended privileges. can be anonymous (defined using the As MMM applets are transmitted in lambda notation of lambda calculus), object form, the question of how to trust and can refer to variables outside the binary object code arises. Unlike their own definition (free variables). Java, the object code is not verified be- Higher-order functions are at the fore execution, but is instead associated heart of functional languages, and are with a cryptographic checksum of the thus not specific to O’Caml. interfaces of the imported modules. For protection from unsafe compilers, the O’Caml includes support for concur- MMM designers suggest employing rency through threads and mutexes (al- trusted compilation sites to certify that though applets do not support the use of the object code is the result of a correct threads) and class-based object orienta- compilation. tion through an extension of the typing discipline. An object is an instance of a Linking static class, which can be the specializa- tion of multiple superclasses. In the O’Caml includes library support for dy- case of name classes, only the last entry namic linking object files. As there is no is visible, but the shadowed name can way to access the loaded entities, dy- still be accessed. This is a simple solu- namically loaded modules are responsi- tion, but it entails an asymmetry in ble themselves for registering the func- which inheriting from A and B is differ- tions they export. The dynamic linking ent from inheriting from B and A. used for applets constrains the use of O’Caml supports a small number of vis- the primitives that are considered dan- ibility modifiers for classes and object gerous. attributes: a class can be declared vir- All applets consist of exactly one (po- tual, preventing instances from being tentially big) module, which may con- created, and closed, preventing sub- tain nested modules. As the applets are classes from being derived. Attributes self-contained and only allowed to use a can be declared private, making them fixed set of development modules, they inaccessible outside the methods of the cannot interfere with one another, thus defining class. avoiding the complications of separate name spaces for applets.

Security Applications MMM applets are only allowed to use Applications for MMM include browser safe variants of the standard libraries. extensions, decoders for new contents A safe library imports everything ex- types, animations, games, and the like. ported from the unsafe original, but it The advantage of O’Caml is a richer only re-exports a selected subset. Enti- language, with support for several pro- ties considered to be safe are re-ex- gramming paradigms: functional, im- ported directly, but sensitive structures perative, and object-oriented. are exported as abstract data types; for dangerous functions, wrappers are ex- Reflections ported that check the capabilities of the applet (a wrapper for a function is an- In Java, since all standard library func- other function with the same signature tions can potentially be called by an that may perform extra computations applet, they must all be secured. With before and after calling the original). MMM, only functions exported by the

ACM Computing Surveys, Vol. 29, No. 3, September 1997 Languages for Mobile Code • 225 safe libraries need to be checked. The An object in Obliq is a collection of major drawback of MMM is the need for attributes (named values). A simple trusted compilation sites. point object p can be written {x ϭϾ 3, y ϭϾ 4}. There are four basic operations on objects. 4.4 Obliq Selection/invocation: using the value of Obliq [Cardelli 1995] of DEC System an attribute or invoking a method, for Research Center is a lexically scoped, example, p.x and display.plot(p). dynamically typed, prototype-based lan- guage, designed for distributed object- Updating/overriding: changing the oriented computations. Computations in value or the method bound to an at- Obliq are network-transparent; that is, tribute, for example, p.x ϽϪ 4 and they depend neither on the allocation display.plot ϽϪ lineto. Notice that it is site nor on the computation site, but the legal to change a value into a method distribution is managed explicitly at the and a method into a value. language level. Cloning: creating a shallow copy of an object. The immediate values of at- tributes are copied, but structured The Language values introduce sharing. For exam- ple, array elements are shared be- To support network transparency, Obliq tween the clone and the original ob- extends the static scope to the network: ject. Cloning is generalized to support free variables of transmitted computa- mixing several objects with disjoint tions can refer to objects from the origin names. Using the given examples, site. The language has three main char- clone(p,display) produces an object acteristics. with at least the attributes x, y, and —Any value can be transmitted be- plot. tween hosts, including closures and Aliasing: mechanism of aliases redirect- object references. Objects themselves ing attributes to attributes in other are local to a site and are not consid- objects. All selections and updates on ered as values, but object migration an aliased field are done on the redi- can be programmed with a combina- rection target. For redirected method tion of closure transmission, aliasing, invocation, the “self” object is the ob- and object cloning (see the following). ject containing the redirected target, —Obliq belongs to a class of object-ori- not the object containing the alias. An ented languages called prototype alias itself can redirect to another based. In prototype-based languages alias. Objects consisting only of there are no classes, and objects are aliases are called surrogates (also created by copying (cloning) existing known as proxies in other languages). objects (the prototypes). Obliq uses a For examples of aliasing and redirec- simple variant of prototyping, called tion, consider {x ϭϾ alias x of p1} and embedded prototyping, which avoids redirect p2 to p end. The latter makes all the complications of delegation- all attributes of p2 aliases of the cor- based prototyping [Malenfant 1995]. responding attribute of p. In embedded prototyping, all the Objects can be protected against modifi- methods valid on an object are con- cation, aliasing, and cloning from out- tained in the object itself; that is, they side the object using the protected key- are not searched for in a list of super- word. Safe interfaces to objects can be classes. constructed through a combination of —Obliq is dynamically typed. Type er- protection and surrogates. rors are caught cleanly and propa- Concurrency is inherent in Obliq; pro- gated to the origin site. cesses can execute independently on

ACM Computing Surveys, Vol. 29, No. 3, September 1997 226 • Tommy Thorn distinct servers and processes can let migrate spawn new threads locally. To handle Proc ϭ proc(obj, engineName) concurrent accesses, Obliq supports se- let engine ϭ net_importEngine rializing objects. An object is serialized (engineName, Namer); let remoteObj ϭ engine(proc(arg) if at most one can access an clone(obj) end); object or run one of its methods at any redirect obj to remoteObj end; given time. This is realized using a mu- remoteObj; tex on the object, which is acquired end; when one of its methods is invoked and released when the method returns. To To migrate an object obj to an engine, avoid trivial deadlocks, operations and we first ask a name server for a refer- method calls from within the object it- ence to the engine. Next, we remotely self are not subject to locking. For de- execute on this engine (engine(...))a cloning operation of obj, resulting in a tails, see Cardelli [1995]. 3 Lexical scoping hides named values remote object. Finally, all attributes of from outside a given block and run-time obj are made aliases for the correspond- typing ensures that these scope rules ing attributes of the remote object (redi- are enforced. Extending the lexical rect obj to remoteObj end). scope to the network makes it possible The distributed object model is closely to use scope rules to address security based on (and implemented with) the issues, such as information hiding; a -3 network objects [Birrell et al. procedure executing on a foreign server 1993]. has only access to its own parameters and free variables. Communications be- Security tween two independent servers are Besides the basic use of scope to control meditated by a shared global name what is exported, no special provision server, which allows servers to import for security in Obliq is made at the time and export local values. To have a pro- of writing [Cardelli 1995]. cedure executed on a distant server, the name server is asked for the engine Linking object accepting procedures. For exam- ple, remote invocation can be pro- Transmitted closures can use functions grammed as follows (on the client side): from the basic library, but do not other- wise gain access to names from the re- let mydisp ϭ net_import(“display”, Namer); ceiving site. Names are explicitly ex- mydisp.plot(p); ported by passing them as parameters Here the name server, Namer, is asked to the received closure. for the value registered with the name display. After this call, mydisp is a refer- Applications ence to the display object, either locally or on a remote server, and is treated Obliq is an academic project, and the like any other object. For example, we collection of example applications is can invoke the method plot with a vari- small. Examples include a user-inter- able p. This example assumes that some face toolkit, algorithm animation, 3D process has exported display with graphics, and its use as the basis of the Visual Obliq distributed application net_export(“display”, display) builder. Object migration can be programmed using a combination of closure trans- mission, cloning, and surrogates. The 3 Note that the engine-specific information sup- following example is taken from plied as parameter arg to the migrated closure is Cardelli [1995]. not used here.

ACM Computing Surveys, Vol. 29, No. 3, September 1997 Languages for Mobile Code • 227

Reflections stract classes that cannot be instanti- ated. Mix-ins can themselves inherit Using network-wide scope for distrib- from other mix-ins. uted applications leads to an elegant Use of classes can be restricted in two and powerful model of programming. As ways: a sealed class cannot be special- object migration can be expressed ized, and an abstract class cannot be within the language, it is possible to instantiated. Attributes of objects can program autonomous traveling agents be either private or public. Private at- in Obliq. This is not possible in the tributes can only be accessed from the model employed by Java and O’Caml. class itself and its subclasses, whereas Without further experimental results, public attributes are unrestricted. The however, it is difficult to evaluate the operator protect, a novelty of Telescript, advantages and drawbacks. It seems turns object references into protected that this model could be inefficient, references. A protected reference cannot leading to many small messages to be be used to modify an object. transmitted: one for each access to a Agents are processes with a number of remote object. Also, Obliq seems to re- properties. quire a much tighter coupling of hosts to support a distributed garbage collec- tion. —The telename is the pair of an author- ity and a process identity that to- gether name a process. The authority 4.5 Telescript identifies the (usually human) Tele- Telescript [General Magic 1996] of Gen- script user. eral Magic is an object-oriented class- —The owner is the process that will own based language designed for network future created objects (except pro- programming. Telescript is intended not cesses, which own themselves). This as a general-purpose language but as a is usually the current process, but it specialized language for communica- can be temporarily changed. Objects tion, just as PostScript is a language for not owned by any process are garbage printing. The system is based on a num- collected. ber of metaphors from the real world. —The sponsor is the process whose au- The central concept in Telescript is the thority will be attached to and agent, which autonomously travels on charged for new created objects. the Telesphere (a Telescript network of —The client is the object whose code engines) doing business on behalf of its requests the current operation. owner. The engine is a Telescript inter- preter with a collection of built-in —The permit specifies the capabilities classes and an engine place. Engines of the current process. A permit has a provide persistence of objects, even in number of process parameters: the presence of a system crash. Places —the age is the maximum life in sec- are stationary processes that can accept onds, incoming traveling agents. Users can —the extent is the maximum size of create their own places nested within memory allowed to the process, other existing places. Resource usage —the priority is used to determine can be tracked and charged to the re- when to schedule the process for sponsible user. execution, and —the Boolean parameters canCreate, The Language canGo, canGrant, and canDeny specify whether the process can cre- The Telescript language itself is class- ate new processes, travel, raise the based and includes run-time typing. permission level of other processes, Classes can inherit from a single super- and lower the permission level of class and any number of mix-ins, ab- other processes, respectively.

ACM Computing Surveys, Vol. 29, No. 3, September 1997 228 • Tommy Thorn

Agents are sent by invoking their go —uncopied: objects of this class cannot operation with a ticket, specifying the be copied; destination place and possibly the route —copyrighted: objects of this class can to this address. If the destination ac- only be instantiated if authorized by a cepts the agent’s authority and permits, Copyright Enforcer object; and the agent is sent together with its ob- —protected: objects of this class cannot jects to the place and resumes execution be modified. within the new place. The effective ca- pabilities of a process are computed as Security the intersection (minimum) of the four permits process, local, regional, and As is apparent from the preceding, secu- temporary. The local permit is imposed rity is an overall consideration that af- by the place entered, the regional per- fects most of the Telescript language. mit is imposed by the engine, and the The permission model is elaborate and temporary permit can be imposed by the applies to resource consumption as well language construction restrict.Inthe (the model can thus address denial of following we give an example (taken service issues). from General Magic [1996]) of how to execute a method from an untrusted Linking object, using a temporary permit. Mobile processes in Telescript are run in a separate domain and can only in- paranoid :ϭ Permit( ); teract directly with the engine in which paranoid.canCreate ϭ false; they run. All interprocess access is me- paranoid.canDeny ϭ false; paranoid.canGo ϭ false; diated by the engine. paranoid.canGrant ϭ false; paranoid.age ϭ *.age ϩ 2; Applications paranoid.extent ϭ *.size ϩ 1000; try { Telescript is envisioned to make possi- restrict paranoid { ble an electronic marketplace where us- yourObject.yourSuspect- ers can launch their agents to search Call( ); and reserve tickets, inspect currencies, }; and the like. catch failed: PermitViolated {...}; Reflections catch... } The Telescript system includes a num- ber of features to restrict the actions of First we create an new empty permit, agents, but they seem to suffer from a named paranoid, which we initialize lack of systematic design. It is not clear with very restrictive permissions. We how to be convinced of the consistency set the maximal age to current plus two of the implemented security restric- seconds and then allow it to allocate tions. 1,000 bytes of storage. The suspicious A positive aspect of Telescript is that call, yourObject.yourSuspectCall( ),isit tries to deal with denial of service then executed in a try block using the attacks. Telescript agents have their paranoid permit. The try block enables own initiative to travel and are thus us to catch violations, such as code run- more powerful than Java applets, but in ning too long or using too much space. a sense, also more dangerous: they can Four built-in mix-ins are available for be hard or impossible to control once further protections on classes: launched. An interesting aspect of Tele- script is that the user does not have to —unmoved: objects of this class cannot be connected to the network while his be taken along with a traveling agent; agent is acting. The agent can finish its

ACM Computing Surveys, Vol. 29, No. 3, September 1997 Languages for Mobile Code • 229 business and return to the user once he Security reconnects to the network. Tcl is already a safe language in the sense that there is no notion of pointers, 4.6 Safe-Tcl casts, or unchecked array accesses. The aim of Safe-Tcl is to be a safe and secure The idea of executable contents had programming language. To achieve this, been realized in the context of electronic every language construction and primi- mail before the arrival of the World tive of Tcl was carefully examined. Wide Web. We present Safe-Tcl, the Primitives considered to be too danger- most popular among several similar ous or general were replaced by a collec- proposals. First Virtual Holdings pro- tion of more specific ones. For example, poses Safe-Tcl as an extension to the file system access functions were MIME, the Internet multimedia mail removed and replaced by an isolated standard [Borenstein 1994]. The MIME global configuration space. This space is standard defines a standard encoding accessed using two functions: SafeTcl_ for enriched mail; that is, mail with setconfigdata and SafeTcl_getconfig- more than just ASCII text. MIME mail data. The former associates a string to a consists of several parts, each of which key, and the latter returns the string can have a different contents type. The associated with a key. A rich but safe simplest contents type is just the ASCII graphical user interface was a major text, but contents types include several concern in the design of Safe-Tcl. This popular formats for pictures and sound. has likewise been achieved by replacing Safe-Tcl is proposed as an executable primitives that are too general with contents type of MIME, and thus as the more specific ones. standard language for executable con- tents within email. Linking Reflecting the store and forward na- ture of electronic mail, three different As the Safe-Tcl environment is likely to execution phases are distinguished: de- have a great deal of its implementation livery-time, receipt-time, and activa- in Tcl, two interpreters are used: one tion-time. Delivery-time is when the only for Safe-Tcl, running the untrusted mail leaves the sender, receipt-time is applications, the other for full unre- when the mail arrives at the destina- stricted Tcl. The untrusted application tion, and activation-time is when the can interact directly only with the Safe- user reads the mail. It is specified in the Tcl interpreter. MIME header in which of the three phases the program is intended to be Applications executed. (These three phases coincide for Web pages.) Typical applications reported for Safe- Tcl include advanced user dialogues for ordering and voting. Safe-Tcl has also The Language been used experimentally for applets in the SurfIt! Web browser [Ball 1996]. Safe-Tcl is, as the name implies, based on Tcl [Ousterhout 1994], a procedural Reflections script language designed to be simple, portable, easily embedded, and power- Safe-Tcl’s inherent limitations in terms ful. Efficiency was a minor design issue. of efficiency place it in a weak position For simplicity, every value in Tcl is in the competition. On the other hand, represented as a string, including pro- the Safe-Tcl language is much smaller grams themselves. Scope rules are very than any of the preceding five, known to simple in Tcl; there are only two scope be easy to embed, and requires only a levels: local (to a function) and global. small amount of storage.

ACM Computing Surveys, Vol. 29, No. 3, September 1997 230 • Tommy Thorn

Table I. Programming Language Features Security Trust in the object Language OO Concurrency Mobility Safety model code Java ߛߛFetch ߛ PL Verified object code O’Caml ߛ Some Fetch ߛ PL Signed object code Limbo ߛ Fetch ߛ OS Signed object code Obliq ߛߛAgent ߛ PL No provision Telescript ߛߛAgent ߛ PL Secure network Safe-Tcl Fetch ߛ OS Not applicable

5. REVIEW AND COMPARISON —Safety. All of the studied languages are “safe” in the sense that access The main features of the languages pre- boundaries expressed in the language sented in Section 4 are summarized in are enforced. This is an essential Table I. Let us now review them and property for a language if security use them as the basis for a comparative issues are to be expressed using lan- study. guage constructs. Enforcing safety in- —Object orientation. In the context of volves ruling out pointer arithmetics, mobile code, objects are a convenient checking array bounds, using auto- entity in which to encapsulate data matic memory management, disallow- and programs to be sent on the net- ing arbitrary casts, and dynamically work. They also serve as entities for checking casts that promote objects to grouping information with the same more specialized classes. access restrictions. —Security model. The implementations —Concurrency. In a distributed context, of access control can be roughly clas- the notion of simultaneous and inde- sified in two categories: the operating- pendent computations is a natural system approach and the program- one. For Java and O’Caml, support for ming-language-based approach concurrency is rudimentary; multiple (noted, respectively, as “OS” and “PL” threads of control and corresponding in Table I). In the former, the capabil- serialization are supported (O’Caml ities of applets are hardwired into the applets do not support concurrency, run-time system of the language, and though). Limbo and Telescript add a in the latter, they are programmed in rich support for network communica- the language using protection fea- tion. Limbo includes channels as first- tures such as scope and visibility class values. Concurrency is also in- rules: herent in Obliq. —Java has trusted libraries with ac- —Mobility. We can distinguish two dif- cess to native code functions. The ferent models of mobility. trusted libraries are protected —We call code fetching (noted by through a mixture of language con- “Fetch” in Table I) the model used structs and sensitive functions call by Java, O’Caml, and Limbo in upon a (protected) security monitor which the user downloads the code (the SecurityManager) to check dy- to be executed. The initiative is namic access control. with the receiver of the code. —O’Caml uses the module system to —Mobile agents (Obliq and Tele- restrict applets’ use of libraries to a script) are processes that can be fixed safe subset. This subset pro- programmed to migrate themselves, vides a supervised interface to the so the initiative is with the mobile underlying operating system. The code itself. We denote this model security monitor uses a variable of “Agent” in Table I. an abstract data type to represent

ACM Computing Surveys, Vol. 29, No. 3, September 1997 Languages for Mobile Code • 231

Table II. Class and Attribute Protection and Keyword Used Protection offered Java O’Camla Obliqb Telescript Limbo Class protection No subclasses final protected sealed Subclasses cannot add methods closed No instances abstract virtual protected abstract Visible only in same package private NAc NA NA No outside updates protected No aliases NA NA protected NA Mutual exclusion serialize Attribute protection No restriction public defaulta default public exported Visible only in the same package protected NA NA NA NA or in subclasses Visible in subclasses default NA private NA Visible only in the defining class private private NA default Runtime protected reference protect Mutual exclusion synchronized a Visibility in O’Caml can furthermore be restricted using the module system. b Object used as prototypes play the role of classes in Obliq. The restrictions apply only to operations from outside the object. c Not applicable.

its capabilities. As the security policy for the object- —Limbo provides all available re- oriented languages is implemented us- sources as files in an applet-specific ing objects, it is interesting to compare file-system hierarchy, separated the possibilities for restricting access to from other applets. New resources part of the objects in the different lan- are obtained through system mod- guages. Table II summarizes the visibil- ules. Limbo offers no support for ity rules for the four object-oriented lan- protecting user-written modules. guages studied in Section 4. We have —Obliq has the language constructs included a column for Limbo, whose necessary to program access control first-class modules can be compared in the language. Examples on how with classes. to do this are given in Cardelli [1995], but there is very little detail on how this is exploited by the 6. PERSPECTIVES Obliq system itself. The informal treatment of both lan- —In Telescript, capabilities are repre- guage and security aspects is a major sented by protected permission ob- drawback of all the languages studied. jects. Mobile code is executed within a com- —Safe-Tcl offers a fixed and re- plete environment (the run-time envi- stricted functionality through the ronment of the language, the Web built-in functions. browser, the operating system, the net- —Trust in the object code. The safety of work, etc.), so arguing about security object code is based either on trust or enforcement is meaningless without a (in the case of Java) on verification. clear specification of the separation of In the case of O’Caml and Limbo, the the responsibilities among the various trust is based on a cryptographic sig- entities of the environment (what entity nature of a trusted authority. Tele- is assumed to ensure what property?). A script agents are trusted based on number of the flaws discussed in Dean their origin as the network is “secure” et al. [1996] can be seen as a conse- and sender addresses can be trusted quence of the lack of such a clear sepa- to be correct. ration. For example, in Java, classes

ACM Computing Surveys, Vol. 29, No. 3, September 1997 232 • Tommy Thorn loaded from the local file system are just one of these endangers the security more trusted than classes loaded of the whole system. through the network, and thus the Existing research into formal meth- former have access to more dangerous ods for security [Bell and LaPadula operations. Here the integrity of the 1973; Landwehr 1981; McLean 1994] system depends on both the local oper- provides a strong foundation on which ating system and on the Java system. such a model should be build. The chal- One attack exploited a flaw that made it lenge consists of integrating the differ- possible to load classes from anywhere ent levels mentioned in a coherent and in the file system [Dean et al. 1996]. For useful way. The formal model will be this attack to succeed, it must be possi- useful only if it can support the expres- ble for the attacker to upload a file sion of the global security policy and its somewhere on the victim’s file system. decomposition to highlight its impact on This can often be done in a variety of the various components of the system. ways, depending on the local operating One of the components is the execution system. Another attack allowed applets model of the mobile code language, to connect to arbitrary hosts. The attack which would then be characterized pre- succeeded due to a weakness in the cisely in terms of security requirements. Java library, where an external name This, in turn, would provide the neces- service was implicitly assumed to be sary formal basis for both static and trustworthy, which in fact it was not. dynamic verifications in the language Current work on mobile code does not for mobile code. take enough account of the research Several efforts have recently been un- done in programming language seman- dertaken that suggest promising ave- nues for future research to provide a tics [Nielson and Nielson 1992; Schmidt formal basis for mobile code verifica- 1986], formal methods in software engi- tion. Among them, let us mention: neering, like VDM [Bjørner 1991a,b], Z [Spivey 1988], RAISE [Brock and —Mizuno and Schmidt [1992] derive a George 1990], and B [Lano 1996], for- security flow analysis as an abstract mal models of security [Bell and LaPa- interpretation of an enriched denota- dula 1973; Landwehr 1981; McLean tional semantics. The analysis is com- 1994], or research on static program positional; individual modules can be analysis [Banaˆtre et al. 1994; Denning analyzed separately and the results and Denning 1977; Mizuno and Schmidt (symbolic expressions) can be com- 1992; Volpano et al. 1996]. As a starting bined to obtain an analysis of the point, a semantic definition of the lan- entire system. guage would provide an important in- —Banaˆtre et al. [1994] present the de- sight and emphasize the weak parts of velopment of a static analysis for in- its definition with respect to security. formation flow in a simple guarded Such a definition would also make pos- command language. The information- sible formal statements for the security flow logic is based on a noninterfer- claims made by the proponents of the ence property. The analysis is then language. Having a semantics for the derived through successive refine- language would not be enough, though. ments of the proof system into a com- Security is a global property, so a secu- plete algorithm for information-flow rity model must take into account all analysis. aspects of the system supporting the —Volpano et al. [1996] present a sound execution of the code. This includes in type system for secure flow analysis. particular the hardware, the operating The multiple sensitivity level of Den- system, the abstract machine, the mod- ning’s lattice model is formulated as a ule libraries, the security manager, and subtyping relation that can be stati- the browser. A security weakness in cally checked. The soundness of their

ACM Computing Surveys, Vol. 29, No. 3, September 1997 Languages for Mobile Code • 233

system is formulated as a noninterfer- general security policy for the whole ence property of well-typed programs. system. On the other hand, the ap- —Necula and Lee [1996] present a tech- proach of Necula and Lee [1996] can be nique for safely loading binary exten- seen as a more ambitious attempt, but sions into an operating-system kernel. it relies on mostly manual proof con- The technique is based on pairing the struction. One challenge for future work object code with a machine-checkable is probably to find an appropriate inte- proof that the object code respects the gration of automatic and interactive published safety policy of the operat- proof techniques. ing system. The proof, formulated in terms of a simple type system, is ver- APPENDIX. LANGUAGE EXAMPLES ified together with the object code by a type checker in the operating sys- To give the reader a feel of the different tem. If it is found to be correct, the languages we present examples for code is allowed to execute without any Java, Limbo, O’Caml, and Safe-Tcl. Un- dynamic check whatsoever. fortunately, realistic examples would be Their approach represents the best much too long for this article. We omit that can be achieved in terms of per- Obliq and Telescript, as we feel there formance (no more overhead is in- are already sufficient examples of their curred after the type checking) with- treatment. out being dependent on cryptographic signatures. The major problems with A.1 Java their technique is the burden of proof generation, which is manual, and the Figure 1 shows the source of a small fact that each safety policy potentially applet example illustrating the use of requires its own proof. two user interface primitives: buttons —Borgia et al. [1996] present a struc- and labels. tural operational semantics for the 1–2 The two import statements Facile language based on the notion of make all classes from the proved transition systems. Facile is a package applet and awt (ab- concurrent functional language based stract windows toolkit) avail- on the ␲-calculus, a basic language able under their unqualified for mobile processes. The semantics class name. In this example, can be used to extract causal depen- we use the class Applet from dencies, as demonstrated by its use to the package applet and the debug a mobile agent system. This classes Button and Label from work is very promising, and it will be the package awt. interesting to see how well it scales to 4–6 The Applet class constitutes a mobile code programming languages. framework for applets and all applets are a specialization of The advantage of the preceding con- the Applet class. The applet tributions on program analysis or typ- in this example, HelloWorld, ing [Banaˆtre et al. 1994; Mizuno and has two graphics items, a Schmidt 1992; Volpano et al. 1996] is push button and a label. that they lead to mechanical verifica- These are declared as the pri- tions. Their limitation is that they de- vate variables push and lab of scribe individual security analyses for class Button and Label, re- programs without considering their in- spectively. tegration in the general context (includ- 8–11 The system initializes applets ing the network, operating system, ab- by calling init. The first thing stract machine, etc.). As a consequence, to do is to call the init defined the properties of the programs verified in the superclass. A new But- by the analyses are not linked to a ton instance, initialized to

ACM Computing Surveys, Vol. 29, No. 3, September 1997 234 • Tommy Thorn

Figure 1. Source of Java applet HelloWorld.

carry the label “Push me,” is A.2 Limbo created in line 10. This in- stance is assigned to the local Figure 2 shows a Limbo example that attribute push. This instance uses a channel to report activations of a is added to the scene4 button, illustrating communication be- through the method add, de- tween Tk and Limbo. fined in the class Container 1 The head of the module de- (not shown) in the package clares that what follows con- java.awt.Applet is a special- stitutes the module Hello. ization of Container. Module names begin with up- 14–17 Button events are delivered percase letters. to Applet objects by calling 3– 5 The signature of modules Draw their action method. For this and Tk are included from the example, it is not necessary files “draw.m” and “tk.m”. The to check which button was tk: Tk statement declares an in- pressed, as there is only one; stance named tk of the module thus the two arguments, Tk (initialized to nil). Types event and arg, are ignored. and constants exported from The applet responds to the modules can be accessed di- button press by removing the rectly from the signature of the button from the scene (line module, but functions and vari- 15) and changing the mes- ables require a module in- sage in the label (line 16). stance. As this example only remove is a method of Con- uses the type Context from tainer like add, and setText module Draw, there is no need (line 16) is a method of the to make an instance. class Label. 7–9 The signature of this module exports only one function, init. 4 As in C, assignments are expressions with the By convention, programs capa- value of their left-hand side. ble of being executed from the

ACM Computing Surveys, Vol. 29, No. 3, September 1997 Languages for Mobile Code • 235

Figure 2. Limbo example.

top-level shell have a function the Limbo channel with a Tk called init with the signature: a string, thus bridging the two function taking a graphics con- languages. Messages sent on text (type Context from module tcmd from the Tk language Draw) and a list of string argu- appear on ccmd. ments. The graphics context 16–18 These three lines are calls to provides a reference to a win- the Tk graphics library to dow system, necessary to cre- create a button that sends ate new windows. the string “bye” on tcmd 11–12 The implementation of the (ccmd) upon a mouse press. init function starts by loading The pack command places the the implementation of mod- button .b on the top level win- ule Tk and creating the in- dow, and update makes it ap- stance tk. By convention, each pear on the screen. module declaration includes a 19–20 The last thing to do is to wait pathname constant that for a message on the channel. points to the code for the The value received is ignored. module; this is the second pa- rameter TkϪϾPATH of the load statement. A.3 O’Caml t 13 The variable is declared and Figure 3 shows the complete source of initialized to a reference to an MMM timer applet. the top-level window. 14 The string channel ccmd is 1 The open statement imports declared and instantiated. In all publicly exported identifi- this example, a message on ers from the module. this channel signals the ter- 2–5 Safestd is a safe subset of the mination of the application. standard library, containing 15 The namechan call associates basic primitives, such as

ACM Computing Surveys, Vol. 29, No. 3, September 1997 236 • Tommy Thorn

Figure 3. O’Caml applet source.

string_of_int in the following. The ˆ operator is for string The module Safemmm gives concatenation. restricted access to internals 19–23 The tim function updates the of the MMM browser. Here text label continuously (with we only need it to create a a small pause). Winfo.exists 1 top-level window. Safetk con- checks that the window is tains the graphics toolkit still present (the user can de- function widgets. lete it externally using the 7 The applet is a top-level func- window manager). If the win- tion f with three arguments of dow has been removed, the no relevance for this example. applet stops. The function ad- 8–9 Creates a top-level window d_timer registers a thunk (a and sets the title to “Time function of type ( ) 3 Unit) Web.” for execution after a given 10 Applies constructor Text to number of milliseconds. The string “00:00:00,” creating an call to add_timer itself re- initialized text label 1 (a turns immediately. graphic element). Text is im- 24–25 The updating is started and ported from the module Tk. the label is installed in the 11–17 Defines a function to format top-level window. The list Fill the current local time as a Fill_X are options to pack, string and update the label 1. making the label take all

ACM Computing Surveys, Vol. 29, No. 3, September 1997 Languages for Mobile Code • 237

Figure 4. SafeTcl applet source.

available space in the win- variable followed immediately by the dow. string “.m”. 27 The final step is to register 1–9 We define a function, order- the applet function, establish- shirt, which queries the user ing the connection between for size information, using O’Caml and MMM. the SafeTcl_getline function. The result is wrapped up in A.4 Safe-Tcl the body of a mail by SafeTcl_ Figure 4 gives the partial source of a makebody and sent to Safe-Tcl example application. In Tcl, [email protected] (by the first word of the line is always the SafeTcl_ sendmessage). name of a command and the rest of the 11 The conditional checks line contains the arguments, separated whether it is running in a by spaces. The backslash at the end of Tk-capable client. the line allows long statements to be 12 Creates a top-level window broken across several lines. The square and stores it in the variable brackets group subexpressions. The win. The command set assigns curly brackets group arguments: every- variables to values. thing up to but excluding the matching 13–18 Equips the window with a closing curly brackets is considered one leading message and two but- argument. Optional arguments are pre- tons, one of which activates ceded by a keyword with a leading dash the previously defined order- (–), imitating the Unix command-line shirt upon button press. The convention. Variables are accessed with built-in commands message the dollar operator ($). String concate- and button create a message nation is implicit everywhere; for exam- panel and a button graphic ple, $win.m is the contents of the win element, respectively.

ACM Computing Surveys, Vol. 29, No. 3, September 1997 238 • Tommy Thorn

19–20 The pack command groups DEAN, D., FELTEN, E. W., AND WALLACH,D.S. the graphics elements to- 1996. Java security: From HotJava to pady 20 Netscape and beyond. In IEEE Symposium on gether. The argu- Security and Privacy (Oakland, CA). ment specifies a vertical fill- DENNING, D., AND DENNING, P. 1977. Certifi- ing factor. cation of programs for secure information flow. Commun. ACM 20, 7. REFERENCES DROSSOPOULOU, S., AND EISENBACH, S. 1996. Proving the soundness of the Java type sys- ADL-TABATABAI, A., LANGDALE, G., LUCCO, S., AND tem. Tech. Rep., Imperial College, Oct. WAHBE, R. 1996. Efficient and language-in- dependent mobile programs. In Proceedings of FOX, R. 1996. Internet sabotage. Commun. the SIGPLAN ’96 Conference on Programming ACM 39, 11 (Nov.), 9. Language Design and Implementation. FREIER,A.O.,KARLTON, P., AND KOCHER,P.C. ARNOLD, K., AND GOSLING, J. 1996. The Java 1996. The SSL protocol. Available at http:// Programming Language. Sun Microsystems. home.netscape.com/eng/ssl3/index.html, BALL, S. 1996. The SurfIt! browser. Available March. at http://pastime.anu.edu.au/SurfIt. GENERAL MAGIC. 1996. The Telescript home BANAˆ TRE, J., BRYCE, C., AND LE ME´ TAYER,D. page. Available at http://www.genmagic.com/ 1994. Compile-time detection of information Telescript. flow in sequential programs. In Proceedings of GOSLING, J., JOY, B., AND STEELE, G. 1996. The the European Symposium on Research in Java Language Specification. Sun Microsys- Computer Security, No. 875 in Lecture Notes tems. in Computer Science. Springer-Verlag, New HANSEN, W. J. 1990. Enhancing documents York. with embedded programs: How Ness extends BELL, D., AND LAPADULA, L. 1973. Secure com- insets in the Andrew toolkit. In Proceedings of puter system: Mathematical foundations and IEEE Computer Society 1990 International model. Tech. Rep. M74–244, MITRE Corp. Conference on Computer Languages, (New Or- BILLON, J.-P. 1996. Java security: Weaknesses leans, March) IEEE Computer Society Press, and solutions. Available at http://www.dya- Los Alamitos, CA, 23–32. de.fr/actions/VIP/JS_pap2.html. HUITEMA, C. 1996. IPv6: The New Internet Pro- BIRRELL, A. D., NELSON, G., OWICKI, S., AND WOB- tocol. Prentice Hall, Englewood Cliffs, NJ. BER, E. 1993. Network objects. In Proceed- JAVASOFT. 1996. JavaSoft products. Available ings of the 14th Symposium on Operating at http://www.javasoft.com/nav/read/products. Systems Principles. html. BJØRNER, D. 1991a. Software Architectures and KNABE, F. 1996. An overview of mobile agent Programming Systems Design; Volume I: programming. In Proceedings of the Fifth LO- Specification Principles—the VDM Approach. MAPS Workshop on Analysis and Verification Addison-Wesley/ACM Press. of Multiple-Agent Languages. (Stockholm, BJØRNER, D. 1991b. Software Architectures and Sweden, June) No. 1192 in Lecture Notes in Programming Systems Design; Volume II: Im- Computer Science, Springer-Verlag, New York. plementation Principles—the VDM Approach. LANDWEHR, C. E. 1981. Formal models for com- Addison-Wesley/ACM Press. puter security. ACM Comput. Surv. 13, 3, BORENSTEIN, N. S. 1994. Email with a mind of 247–278. its own. Available at ftp://ftp.fv.com/pub/ LANO, K. 1996. The B Language and Method. code/other/safe-tcl.tar.gz, 1994. Springer-Verlag. ORGIA EGANO RIAMI ETH AND B , R., D , P., P , C., L , L., LEROY, X. 1997. Objective Caml. Available at THOMSEN, B. 1996. Understanding mobile http://pauillac.inria.fr/ocaml/. agents via a non-interleaving semantics for Facile. In Proceedings of SAS ’96, Vol. 1145 of LUCENT TECHNOLOGIES. 1996. The Inferno home Lecture Notes in Computer Science, Springer- page. Available at http://inferno.bell-labs.com/ Verlag, 98–112. inferno/index.html. ALENFANT BROCK, S., AND GEORGE, C. W. 1990. The RAISE M , J. 1995. On the semantic diversity method manual. Tech. Rep. LACOS/CRI/ of delegation-based programming languages. DOC/3, CRI: Computer Resources Interna- In Proceedings of OOPSLA ’95. tional. MCLEAN, J. 1994. Security models. In Encyclo- CARDELLI, L. 1995. A language with distributed pedia of Software Engineering, J. Mariniak, scope. In Proceedings of the 22nd Symposium Ed., John Wiley & Sons, New York. on Principles of Programming Languages MIZUNO, M., AND SCHMIDT, D. A. 1992. A secu- (Jan.), ACM Press, New York, 286–297. rity flow control algorithm and its denota- Available at http://www.research.digital.com/ tional semantics correctness proof. Formal SRC/personal/Luca_Cardelli/Obliq/Obliq.html. Aspects Comput. 4, 6A, 727–754.

ACM Computing Surveys, Vol. 29, No. 3, September 1997 Languages for Mobile Code • 239

NECULA, G. C., AND LEE, P. 1996. Safe kernel ROUAIX, F. 1996a. MMM browser home page. extensions without run-time checking. In Sec- Available at http://pauillac.inria.fr/ϳrouaix/ ond Symposium on Operating Systems Design mmm/. and Implementation. Available at http://fox- ROUAIX, F. 1996b. A Web navigator with ap- net.cs.cmu.edu/people/petel/papers/osdi.ps. plets in Caml. In Proceedings of the Fifth NETSCAPE. 1997. JavaScript language specifica- International World-Wide Web Conference tion. Available at http://developer.netscape. (Paris, May). Elsevier Science B. V. com/library/documentation/index.html. RUSSELL, D., AND GANGEMI, G. T., SR. 1991. NIELSON, H. R., AND NIELSON, F. 1992. Seman- Computer Security Basics. O’Reilly & Associ- tics with Applications, a Formal Introduction. ates, Inc., Sebastopol, CA. Wiley Professional Computing. John Wiley SCHMIDT, D. A. 1986. Denotational Semantics: and Sons, New York. A Methodology for Language Development. Al- OUSTERHOUT, J. K. 1994. Tcl and the Tk Tool- lyn and Bacon, Needham Heights, MA. kit. Addison-Wesley, Reading, MA. SPIVEY, J. M. 1988. Understanding Z: A Speci- PIKE, R. 1997. Private communication. fication Language and Its Formal Semantics. RESCORLA, E., AND SCHIFFMAN, A. 1996. The se- Cambridge University Press, New York. cure hypertext transfer protocol. Available at VOLPANO, D., SMITH, G., AND IRVINE, C. 1996. A ftp://ds.internic.net/internet-drafts/draft-ietf- sound type system for secure flow analysis. wts-shttp-03.txt, July. J. Comput. Sec. 4(3).

Received December 1996; accepted May 1997

ACM Computing Surveys, Vol. 29, No. 3, September 1997