<<

XL: A Platform for Web Services

Daniela Florescu Andreas Gr¨unhagen Donald Kossmann

XQRL, Inc. TU M¨unchen TU M¨unchen and XQRL, Inc. [email protected] [email protected] [email protected]

Abstract To date, most Web services are defined using Java. Analyzing the first experiences, it has become clear This papers presents XL, a new platform for that Java (or C# for that matter) are not always the Web services. We have designed XL with three right programming languages for this purpose. One main goals in mind: (a) increase application of the most prominent reasons is that the XML type developers productivity via high-level program- system is incompatible with the Java type system; as ming constructs for Web Services routine pro- a result, a great deal of marshalling code needs to be gramming patterns, (b) achieve high scalabil- written in order to convert XML into Java objects and ity, security, and availability for Web ser- vice versa; this code needs to be written in addition vices and (c) compliance with all W3C stan- to the marshalling code that is necessary in order to dards (e.g., XML, SOAP, WSDL) such that bridge the (long-known) impedance mismatch between XL Web services can interact with any other Java and SQL [10]. Web services written in, say, Java or C#. We Another deficiency of Java is that it is not always hope to achieve these objectives by providing appropriate to deal with failures or quality of service the new XL programming model based on a requirements in a network-centric environment. Fur- simple core programming “algebra” that ex- thermore, a great deal of functionality needed in most tends Milner’s PI-calculus [21]. To optimize Web services such as logging, database access, secu- XL programs, we employ techniques from the rity, authorization, transaction support is not part of design of database systems, compiler construc- the basic Java developer’s kit. J2EE has been devel- tion, and data flow machines, as well as tech- oped as an extension to Java that overcomes this lim- niques specially designed for Web Services. A itation [19]. Indeed, J2EE provides a great deal of ad- demo of the platform has been shown at [14]. ditional features, but the J2EE type system is still in- compatible with the XML type system. Furthermore, 1 Introduction J2EE is a rather complicated programming model so that it is significantly less popular than basic Java. Recently, Web services have been proposed as a new model to develop new applications and to integrate As an alternative, several other programming lan- existing applications on the . The basic guages have recently been developed; examples are idea is that autonomous components inter- WSFL (IBM), WSCL (HP), XLang (Microsoft), and act by exchanging XML messages. This technol- the WSCI developed by a consortium including BEA, ogy is particularly useful to implement processes that Intalio, SAP, and Sun [36, 35, 32, 34]. These languages cross organization boundaries; examples are customer- directly support XML and the Web services paradigm, relationship management, supply-chain management, but they are not as powerful and rich as Java. e-procurement, portals, electronic market places, on- This paper presents XL, a new programming lan- line shops, and games. In addition to the XML-family guage, and its implementation. Like WSFL, WSCL, of standards, the W3C has established standards such XLang, and WSCI, XL directly supports XML, the as XML Protocol (i.e., SOAP) and WSDL, and it has other W3C standards, and the Web services paradigm. started a working group on Web Service Architectures. Like Java, it provides a very powerful programming model; in fact, a great deal of syntax for imperative Permission to copy without fee all or part of this material is constructs (e.g., loops) has been adopted from Java. granted provided that the copies are not made or distributed for Since XL supports all W3C standards and communi- direct commercial advantage, the VLDB copyright notice and cates with other Web services using messages, appli- the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base cations written in XL can communicate with applica- Endowment. To copy otherwise, or to republish, requires a fee tions written in other languages (e.g., Java) just as and/or special permission from the Endowment. well as with other XL applications. Furthermore, XL Proceedings of the 2003 CIDR Conference is portable (like Java) and it provides high-level pro- gramming constructs for routine work (e.g., logging Fortunately, the XML standards did not ignore this and security). important database heritage. The semantics of the The implementation of XL combines techniques current W3C’s XML related programming languages from different fields of computer science; in particular, (XSLT and XQuery) are described in terms of an XML database systems, compiler construction, distributed abstract data model [3] that serves the same purpose systems, and data flow machines. XL programs are as the relational data model for relational databases. translated into a core algebra which is in turn opti- The W3C XML data model describes, in an E/R mized and then interpreted in a dynamic and flexible fashion, a set of entities present in an XML document way. We believe that only through such a high-level in- and a set of relationships among them. The entities terface as provided by XL, scalability, high availability, describe the data itself (e.g. nodes, values, sequences). security, and ultimately quality of service can be guar- The data is modeled using very general mathematical anteed. Using languages like Java and current middle- structures, i.e. as ordered trees of nodes. The internal ware architectures with many layers, such guarantees nodes have node identity and they can be of several cannot be given because the level of abstraction in the kinds (e.g. document, element, attribute, comment, programming model is too low and because calls to processing instruction, namespace) while the leaves of library functions must be treated as black boxes and the trees, i.e. the values, can be values in the do- cannot be optimized. In summary, we believe that mains of the XML Schema basic types (e.g. integer, a high-level programming model like XL will not only decimal, string, duration). Pivotal to the XML data significantly increase the productivity of programmers, model is the notion of sequence. One important prop- it will also improve performance and reduce adminis- erty of the XML data model is that sequences are al- tration and operational cost. ways flat; i.e., sequences of sequences are automati- The XL project is still at the beginning. We have cally unnested. Another important property of the a demo that is available on the WWW [14], but both XML data model is the ability to capture the topolog- the language and the current state of the implementa- ical order in nodes of the document. This order can tion are still under development. In addition, we are be queried and exploited during the computation. currently working on a debugger and tools to test Web services. Testing Web services is particularly challeng- ing because all tests must be side-effect free; for Web 2.2 XML Schemas and the XML Type System services testing, a closed world or lab environment assumption is unrealistic because Web services com- The data model describes only the basic composi- municate with Web services from other organizations. tion of an ordered tree. The XML Schema [29] de- The debugger and test environment are not described scribes structural and content-based constraints on the in this paper; these tools are described in [28]. ordered trees. The XML Schema describes the simple types (with their accepted domain values and the ac- 2 Relevant XML Technology cepted basic operations) supported by the XML data model, the definition of user-defined complex types An important requirement in the design of XL is the and gives a basic support for user-defined integrity compatibility with the existing state of the art in the constraints (e.g. referential integrity constraints, lexi- XML and Web Services area. We detail the current cal constraints). state of the art in this section. The XML type system formally described in [30] captures the essence of the structural information 2.1 XML Abstract Data Model. present in the XML Schema. The goal of the type sys- A very popular wrong myth related to XML is that tem is threefold. First, it is possible to do type check- “XML is just a syntax”. While it is true that the orig- ing: given an expression and the type of the input data inal XML recommendation described only a syntax for set, it is possible to detect statically if the expression data and documents, and not a logical data model, the will return errors on all (or some of) the valid instances W3C is currently in the process of standardizing such of the input type. Second, it allows automatic type in- a logical, abstract data model for XML. The purpose ference: given an XML expression (as described in the of an abstract data model has been clear since the orig- next paragraph) and the type of the input data set, inal papers of Codd in the 70’s: it allows programs to the type system is able to intentionally (i.e., without achieve logical/physical data independence. In other executing the query on any particular data set) de- words, programmers can concentrate on the abstract rive the type of the result. Finally, the type system is representation of data and they can ignore the real capable of testing the type subsumption. This is a use- physical representation of the data. As a result, the ful feature for the following scenario: given an XML physical data representation can evolve without any expression and the type of the input data set, detect impact on the code of the applications itself. The huge automatically if the result of the evaluation of the ex- advantage of this concept has been validated in the last pression on all valid input data instances will be valid 30 years by the success of the database industry. instances of a predefined expected output data type. More detailed information about the type system can of the envelope when used for RPC (Remote Proce- be found in [30]. dure Call) applications, (c) a mechanism for serializ- ing data representing non-syntactic data models such 2.3 XML Expressions and XQuery as object graphs and directed labeled graphs, based on the datatypes of XML Schema and (d) a mechanism A complementary W3C standard deals with XML ex- for using HTTP transport in the context of an XML pressions and XML queries [27]. XQuery is a func- Protocol. tional language. Like all functional languages, XQuery expressions are constructed using first order and sec- ond order function applications starting with variables 2.5 Web Services Definition Language and constants. Examples of first order functions are: Finally, one of the major requirements for the develop- logical, arithmetic, string manipulation, collection ori- ment of Web services is the ability to describe their in- ented operations like union, intersection and differ- terface, the boundary across which applications (Web ence. Examples of second order functions are map services user agents and Web services) communicate. and sort. Of particular importance are the second Applications can then interoperate using this interface. order FLWR expressions: they are XML expressions The Web Services Description Working Group in- constructed based on a pattern that is akin to SQL’s side the W3C has the task of defining such an inter- SELECT-FROM-WHERE queries. Like SQL queries, face. WSDL is composed of the following components: a FLWR expression has a special clause to define vari- (a) the message: a definition for the types and struc- ables and their associated domains (the FOR clause tures of the data being exchanged, (b) the message in XQuery corresponds to the FROM clause in SQL), exchange patterns: the descriptions of the sequence of a special clause that filters variable bindings based on operations supported by a Web service, and (c) the predicates (the WHERE clause in both languages) and protocol binding: a mechanism for binding a protocol a special clause that specifies how to construct the re- used by a Web service, independently of its message sult (the RETURN clause in XQuery corresponds to exchange patterns and its messages. the SELECT clause in SQL). Special expressions called path expressions are used in order to navigate in an All the pieces described above (XML abstract data XML tree; the syntax and semantics of path expres- model, the XML schema and type system, the XML sions are defined in the XPath standard [37]. expressions, the Web Services messaging protocol and XML queries [27] are declarative, side effect free the Web Services abstract interface) constitute excel- programs that manipulate XML data. A query is lent building blocks for a Web Service infrastructure. composed of a preamble containing function defini- Unfortunately, there is no real programming language tions, local type declarations, function declarations, for Web Services in the current state of the art, and XML schema imports, plus a main expression to be we believe that the existing programming languages evaluated and returned as a result of the execution (e.g. Java, C#) or platforms (e.g. J2EE) are not ap- of the program. Unfortunately, the logic of complex propriate for this task. We propose XL as an alter- Web services cannot be described using only declara- native programming language, specially designed for tive programs, or using only side effect free XML query Web Services. expressions. 3 XL Programming Model 2.4 XML Protocol (SOAP) In this section, we describe how Web services can be An important standard for the Web Services area is defined using XL. We describe the fundamental design the XML Protocol [26]. The initial focus of the XML principles of the proposed programming interface, the Protocol (SOAP) standardization effort is to create notion of a Web service and how Web services interact a simple XML-based messaging protocol that can be in conversations, and the syntax of the most important ubiquitously deployed and easily programmed through programming constructs. An example is given in [13]. scripting languages, XML tools, interactive Web devel- opment tools, etc. The goal is a Web enabled layered 3.1 Design Principles system which will directly meet the needs of applica- tions with simple interfaces (e.g. getStockQuote, vali- In [13], we gave a list of 17 desiderata that we believe dateCreditCard), and which can be incrementally ex- to be important for a programming language that sup- tended to provide the security, scalability, and robust- ports the definition and composition of Web services. ness required for more complex application interfaces. Here, we would like to reiterate the five most impor- The XML Protocol is composed of the following four tant principles that drove the design of the XL pro- components: (a) an envelope for encapsulating XML gramming language. data to be transferred in an interoperable manner that allows for distributed extensibility and evolvability as • XL should support a unique data model and type well as intermediaries, (b) a convention for the content system: the standard XML one [27]. • XL should be expressive enough to describe the context of a conversation (Section 3.3). Instances logic of most Web services. of these variables are created whenever a Web ser- vice starts a new conversation with other Web ser- • XL should not just be complete with respect to vices. In an online shop each sales process would Web service specification, but also comfortable to be represented by a single conversation. A vari- use. Hence, it should provide special constructs able in this context might reference for example for important Web services programming patterns the specific customer and the terms of payment (e.g., logging, retry of actions, and periodic ac- which apply in this case. tions). In addition, XL supports local variables which can • With the help of XL, programmers should concen- be used in operations; these variables, however, trate entirely on the logic of their application and are not declared globally. Such (traditional) local not on implementation or optimization issues. variables are instantiated for each block/action in- vocation and destroyed when the execution of the • XL must be compliant with all W3C standards block/action is completed. and it must gracefully co-exist with the current Web services and infrastructure. The values of all the variables in XL are XML values, i.e. instances of the XML data model [3]. 3.2 Web Services in XL In other words, the value of an XL variable can be an XML document or the content of a SOAP From our perspective, a Web service is any au- message. The type of an XL value may (but tonomous software component that is identified by a it is not required to) be constrained using XML unique URI, understands SOAP messages, and whose Schema [29]. However, it is also possible to deal actions can be described by WSDL. Every XL pro- with untyped XML data or with XML data for gram will naturally meet these requirements and will which the type is not known apriori. take the burden from the application programmer to worry about standards like URI, SOAP, or WSDL. • Declarative Web Service Clauses: This part However, this definition of a Web service is very gen- contains invariants (integrity constraints) and eral and does not imply any particular programming other high-level, declarative directives that con- model. In other words, a Web service could also be im- trol the run-time behavior of the Web service; e.g., plemented in Java or any other language. Also, Web declarative error handling mechanisms, declar- services written in different languages can communi- ative discretionary access control, default ac- cate using SOAP and independently from the way they tions, periodic actions, and declarative conversa- were defined. tion patterns (Section 3.3). Invariants will typi- Technically speaking, a Web service defined using cally constrain the values of the global variables, XL generalizes the notion of an XQuery entity. The very much like database integrity constraints. In- definition of a Web service in XL is composed of four variants can also be used to implement certain optional parts: policies; e.g., for security. For instance, an invari- • Web Service Definitions: As in XQuery, local ant could express that a Web service is not allowed functions and types can be defined as part of a to interact with two different banks as part of the Web service definition. Furthermore, schemas and same conversation. namespaces can be imported. The syntax and • Operation Specifications: This part describes semantics are the same as for XQuery. the possible actions (or operations) supported by • Variable Declarations: This part declares the the Web service. In object-oriented terminology, internal data (state) of the Web service; in other the operations of a Web service correspond to the words, this part specifies the global variables methods of a class. The statements that can be of a Web service. XL supports two kinds of used in order to define the operations body are global variables: web-service instance variables summarized in Section 3.4. and conversation-instance variables. Both kinds of variables have a global scope, i.e they are ac- 3.3 Conversations cessible in the entire XL program. Web-service As mentioned earlier, Web services communicate by instance variables have a single instance per entire the means of messages. A conversation is defined as Web Service. An example of this kind of variable a set of correlated messages exchanged between two is the customer database of an online shop which or more participants in order to achieve a certain goal has to be accessed across all conversations. (e.g. business goal, information exchange). Conversa- Conversation-instance variables will have a differ- tions are uniquely identified by URIs; they are usually ent instance per conversation which the Web ser- implemented by carrying the conversation URI in the vice participates in. These variables represent the envelop of each exchanged message. A typical example for a conversation is a user ses- defined patterns, and allows programmers to declare sion in an online shop. First, the user logs onto the them; actions are performed automatically to imple- system (the first message from the user to the sys- ment them. For instance, if the implementor chooses tem plus answer from the system). After that, the a Mandatory conversation pattern, then each incom- user buys something (the second message and answer), ing message must contain a conversation URI (i.e., be and finally the user determines a payment method (the part of a conversation); otherwise, an error is returned. third message and answer). Naturally, the third mes- Also, the mandatory pattern implies that the conversa- sage can only be understood in the context of the first tion URI of an outgoing messages (electronic payment) two messages. is the same as the conversation URI of the correspond- Another example for conversations is an online auc- ing inbound message (a purchase order) that triggered tion. The auction site informs its customers about a the payment; in other words, the inbound and outgo- new product which is for sale; the customers reply by ing messages are part of the same conversation. More sending bids; the auction site, in turn, informs cus- about XL conversation patterns can be found in [15]. tomers about new bids and the status of the auction. Again, a bid can only be understood in the context of a whole auction and the messages exchanged for a 3.4 XL Statements and Combinators particular auction are correlated. The auction exam- The body of an XL operation is described by ple shows that more than two Web services can be in- statements, which are extensions of XQuery expres- volved in a conversation and that interaction patterns sions [27]. In addition to classic imperative state- can be quite complex. Obviously, not all participants ments like variable assignment, conditional state- of a conversation are allowed to listen to all messages ments, loops, error handling and return statements, that are exchanged as part of the conversation; for in- XL supports some XML specific update statements stance, sealed bid auctions could also be implemented and some Web services specific statements (e.g., Web as a conversation. services invocation, logging, sleep). Finally, in addi- XL supports the definition of Web services that tion to the classic imperative statement combinator participate in conversations in two ways. First, XL (sequencing), XL supports other statement combina- frees the programmer from manually managing the tors borrowed from the workflow and dataflow theory contextual data associated to a conversation via the (e.g., dataflow, parallelism, choice). In the following, conversation-instance variables. For example, for the we will briefly describe the most important statements. auction site, the context of a particular auction is the The full set of statements and statement combinators product specification, the customer who wants to sell of the current design of XL is described in [13]. the product, the highest bid, the name of the customer In the expressions of an XL statement all global who made the highest bid, and the list of customers variables declared in the variable declaration part of that participate in a particular auction and would like the Web service are in scope, as well as all the local to be informed about the status of the auction. Obvi- variables defined in the current operation. Finally, for ously, many auctions can be carried out concurrently each operation, two local variables are implicitly de- at the auction site so that many instances of these vari- fined: $input that is automatically bound to the body ables need to be kept: one instance of each variable for of the SOAP message that triggered the current op- each auction. When the auction site receives a mes- eration execution and $output whose value implicitly sage that is related to auction X and that involves the constitutes the body of the response message. execution of the closeAuction operation, then the right instance of, say, the highest bid variable is automati- cally loaded and can be used in the definition of the Variable Assignment closeAuction operation. For a customer, the context of an auction might simply be the highest bid and the The simplest statement is the assignment of a (global information whether this bid was issued by the cus- or local) variable. The syntax is as follows: tomer herself; this information might be need in, say, le t [type] variable := e x p r e s s i o n a react operation that can be defined at a customer’s Web service. Any expression defined by the W3C XQuery pro- The second way in which XL facilitates the conver- posal [27] can be used on the left side of an assignment. sation implementation is the declarative definition of Local variables need not be declared before being used. the conversation patterns. It would be a great deal of The type or XML Schema of a local variable can op- work and error-prone if the programmers would have tionally be set as part of the first assignment to this to manually handle the conversations and their URIs. variable. (Global variables must be declared; the type Fortunately, in practice, there are only a handful of of a global variable can optionally be set when the different patterns that define the way Web services variable is declared.) As in Java, the scope of a local participate in conversations. XL has a set of pre- variable is the block where the variable is defined. Update Statements is halted until the called Web service finishes its execu- tion and returns the result or an error which are both Unfortunately, XQuery does not yet provide expres- also wrapped in SOAP messages. If a variable is given sions to manipulate XML data. Don Chamberlain et as part of the call, then the body of the message re- al. setup a working draft to extend XQuery in this turned by the called service is copied into this variable. respect [11] and once a recommendation has been re- The message is sent exactly once and in a best effort leased by the W3C, XL is going to adopt the syntax way. and semantics of these expressions. In the meantime, As an example, consider the following synchronous we will use the following statements to manipulate service invocation that asks the online broker to buy XML data: 1000 SAP for at most e140.00; the result is stored in • insert in order to add new nodes to the XML hi- the $receipt variable: erarchy (e.g., an additional credit card element) SAP insert < creditcard> ... < /creditcard> 140 into $ c l i e n t Euro 1000 −−> http://www. Broker . com::buy • delete in order to delete nodes from the XML hi- −−> $ r e c e i p t erarchy (e.g., the Visa card) The syntax of an asynchronous call is similar to the delete $client/creditcard [type=”Visa”] synchronous one: • replace in order to adjust elements (e.g., the tele- ==> [ :: ] phone number) [ ==> < operation> ] replace $client/telephone with In terms of the semantics: in this case, the execution (408)8901−23 will not block and the program will immediately con- tinue executing the next statement after the message • rename in order to rename certain nodes (element to the called service has been sent. If the output (reply or attributes ) or error message) needs to be processed, then the name rename $client/name as ” fullname ” of the operation that will process the asynchronous re- sult can be given as part of the call; this operation has to be a member of the Web service that originated the • move in order to move some XML nodes to a dif- asynchronous call. Again, the message is sent exactly ferent location in the XML tree, while still pre- once and in a best effort way. serving the internal structure and the node iden- tifiers. Currently, XL provides no way of setting the en- velop of a SOAP message explicitly. Such constructs move $client/telephone could be useful in order to implement certain kinds after $client/city of conversations, quality of service guarantees, and/or to implement distributed transactions and secure mes- Service Invocation Statements sages. We plan to extend XL in this way once SOAP and the emerging XML Protocol recommendation [22] Probably the most relevant atomic statements in XL have stabilized. are those used for invoking other Web services; i.e., sending a message to another Web service. Often, the other Web service will be written in XL, but messages Conditional statements and Iterations can be sent to any service that has a URI and responds Just like most other programming languages, XL pro- to SOAP messages [26]. Web services are invoked in- vides IF-THEN-ELSE statement, WHILE- and DO- dependently of the specific way they are implemented. WHILE loops (not shown here). The semantics are We propose two ways to invoke a Web service as part straightforward and the same as in other impera- of an XL program: synchronous and asynchronous. tive programming languages. As for example the IF- The syntax of a synchronous call is as follows: THEN-ELSE listing below shows: −−> [ :: ] i f ( < booleanExpression> ) [ −−> < variable> ] then The semantics are straightforward. A message with the value of expression is sent to the Web service iden- endif else tified by uri. If a specific operation of that Web ser- vice should be called, then the name of the operation endelse can also be specified; otherwise, the recipient is ex- In addition to these common control statements XL pected to understand the message without an opera- supports a FOR-LET-WHERE-DO loop, with the fol- tion specification. In a synchronous call, the execution lowing syntax: for < variable> in < expression> | Nondeterministic choice. Execute either “state- le t < variable> in < expression> ment1” or “statement2,” but not both where < booleanExpression> do || Parallelism. Execute “statement1” and “state- The FOR-LET-WHERE-DO loop corresponds to ment2” in parallel. FLWR expressions in XQuery [27], but it executes & Dependency. Considering existing dependencies statements (with potential side effects) instead of eval- betweeen the statements an execution order is uating side-effect free expressions. choosen by the system itself.

Exception handling statements Block Web services implemented using XL signal failure by As in C++ and Java, we use the following syntax to throwing exceptions - just as in Java or C++. The identify a block of statements. The body of an XL pro- syntax of the XL statement that raises an exception is gram, for instance, is formed as a block of statements. as follows: The scope of a variable is the block of statements in throw < expression> which the variable is used for the first time. Here, expression can be any kind of XQuery expres- begin sion. If the exception is not handled locally (see be- low), the execution of the operation terminates and end the value of the expression (instead of the value of the $output variable) is returned as a SOAP message to 4 Optimization and Execution of Web the caller of the service. Just like variables and any Services other expression, the exceptions can be strongly typed optionally. Having described the basic features of the XL pro- XL also adopts the Java syntax for catching excep- gramming model, we would like to turn to a presen- tions. TRY is used to indicate a statement (or se- tation of the design of our platform to execute Web quence of statements) in which an exception might be services. The XL platform has been designed with raised; CATCH is used to write code that reacts to four major goals in mind: (a) high performance, (b) exceptions. The syntax is as follows: high reliability, (c) high security level and finally (d) try very low or inexistent administration costs (i.e auto- endtry matic optimization). We believe that the high level, catch < variable> do declarative programming model of XL will enable us to achieve these goals, while they are very hard to achieve endcatch in an environment entirely based on a lower level im- The variable in the CATCH statement is bound to perative programming language like Java or C#. the value of the data carried by the exception that is Implementing such a platform for Web services in- raised while executing the statement(s) of the TRY volve combining techniques from different disciplines; statement. As in Java, a caught exception will trigger most importantly, database systems, compiler con- the execution of the associated statement. struction, distributed systems, networking, and data flow processing. From database systems, we adopt the 3.5 XL statement Combinators approach to transform programs in equivalent algebra expressions and to carry out transformations / opti- Obviously, the body of an XL program can contain mizations on these expressions. From compiler con- more than one atomic statement. There are several struction, we adopt techniques such as dead code elim- ways to combine statements. In the following “state- ination and peephole optimization. Furthermore, an ment1” and “statement2” can refer to any atomic efficient platform for Web services will employ tech- statement as the ones described in the previous sec- niques like caching and process migration in a cluster tions or to any combination of statements[33]. of servers in order to achieve scalability. Finally, we The typical way to combine statements is by us- advocate the use of pipelining and of data flow process- ing the “;” symbol, like in C++ or Java. Thus, the ing as much as possible; data flow processing has been following means that “statement1” is executed before studied, e.g., in projects like Ptolemy [25]. Exploiting “statement2”. data flow processing is one of many differences between ; our XL implementation and typical implementations Furthermore XL provides a set of additional combina- for, say, Java. tors We present in this section a couple of techniques that will lead us to achieving the four desiderata listed ? Simple error handling. If “statement1” fails, exe- above. We note that not all the techniques presented cute “statement2.”. in the following have been implemented and integrated into our prototype. However, we are currently work- SOAP Outside World SOAP ing on adding these techniques to the system and we hope that we will be able to carry out performance HTTP−Server Message Dispatcher experiments soon.

4.1 Architectural Overview Error Action Test Debug The XL platform has two main components: (a) the Message Handlers XL compiler, and (b) the XL virtual machine. The Virtual Machine role of the XL compiler is to translate the textual rep- resentation of an XL program into a statement graph. Executable Stmt−graph The statement graph is an abstract representation of stmt−graph context Interpreter Threads the definition of an XL Web service, analogous to a database query execution plan, but adapted to state- context default context ments and programs instead of queries. The statement context− stmt−graph operation 1 XML− tree graph uses a core “algebra”, i.e. a set of simple, ba- stmt−graph operation 2 Query context conversation 1 sic statements that are able to support the entire XL stmt−graph operation 3 context context conversation 2 semantics. engine operation 4 context conversation 3 stmt−graph The compilation is done in two steps. First, the tex- stmt−graph operation 5 context conversation 4 List of operations tual representation of an XL program is translated by the XL parser into a naive (usually suboptimal) state- XML Database ment graph. Second, the XL optimizer transforms this naive statement graph into an equivalent but more ef- Figure 1: The architecture of the XL Runtime-System ficient statement graph. The equivalence of statement graphs is defined in terms of the equivalence of their context that contains all the information needed for returned results, but also in terms of the equivalence of the execution. The data produced as the result of the their impact on the context of the conversation where execution of the statement graph is sent back to the they are executed (e.g. variable updates, messages caller as another SOAP message. Figure 1 gives an sent). If we think of a distributed conversation with impression of the XL architecture. different participating servers the context is not just We would like to high-light three important features the locally represented state of a single server but a of the XL virtual machine. First, in order to execute conjunction of the states of all participating servers. statements in parallel, the virtual machine is multi- Furthermore, the optimizer adds annotations to the threaded. Second, the virtual machine is designed to statement graph in order to specify precisely how each be able to stream the intermediate data between state- statement should be executed. ments; pipelining is a very important feature of our The statement graph produced by the optimizer is design. Third, in order to achieve scalability and high interpreted by the XL virtual machine whenever trig- reliability, the XL virtual machine has been designed gered by an incoming message. This process can be to support the migration of processes from one ma- described as follows. chine to another machine in a cluster (we expect that First, each incoming message is processed by a mes- the platform will be installed on a cluster of servers). sage handler. The XL message handlers correspond Some statements and actions change the values of to the various modes of interaction with the XL vir- global variables; such statements involve interaction tual machine. Examples of such modes of interaction with an XML database that stores the values of these are: (a) normal action will invoke an operation of the variables. Furthermore, the XML database stores all Web service in the standard operation mode; (b) error context information and an XML representation of the which deals with inbound messages which are not un- statement graph for each XL operation. Currently, derstood; (c) debug which processes messages invoked we rely on a third-party database vendor for this pur- by the XL debugger; and (d) test which simulates the pose. In fact, at the moment, even a standard rela- actions of the Web service without any side-effects in tional database system could be used for our purposes. order to test a Web service that invokes operations of In order to achieve very good performance, however, 1 another Web service . there must be a tighter coordination between the XL Next, when the normal action message handler compiler, the VM and the database backend. This co- passes a SOAP message to the virtual machine, the ordination is necessary to carry out certain kinds of virtual machine loads (if not cached already) the state- optimizations (see Section 4.5), to exploit indexes of ment graph of the called operation from the database the database, and to operate efficiently in a distributed and executes it. The execution is done in a certain environment (e.g., in a cluster). 1Debugging and testing are not subject of this paper; our The right internal XML data representation is cru- initial design is covered in [28] cial for good performance in the virtual machine. Re- call that all XL programs only manipulate XML data let $_x := expr() (the values of variables are XML documents) and that XL programs (i.e., statement graphs) and contexts are let $_i := 0 represented in XML. In the XL platform all XML data is represented as a vector of tokens; the XML let $_p := ($_i lt count($_x)) parser generates a sequence of tokens for each XML ($_p) for $x in expr do document, in a similar way as a SAX parser gener- let $x := $_x[$_i] ates events2. Representing XML data as vectors of stmt1 | | stmt2 tokens allows a very compact internal representation stmt1 stmt2 and it allows the processing of data in a stream-based way (Section 4.6). In many cases, such a vector never needs to be materialized; instead, the statements are Sync implemented as iterators [17] and consume tokens in a pull-based manner. XML data lives through the entire let $_i := $_i + 1 process inside the XL virtual machine in this internal format, and the data is serialized back into an XML Figure 2: Statement Graph for a FWD- loop string only at the end of the execution of an opera- tion when the result/answer is sent back as a SOAP • Sync: This statement is used to synchronize the message. parallel execution of statements. In the remainder of this section, we will first de- scribe the algebra (i.e., the core XL statements, Sec- • Update: We provide native support for the XML tion 4.2), the statement graph (Section 4.3) and the update statements described in Section 3.4. execution context (Section 4.4). Then, we will briefly sketch possible optimizations (Section 4.5) and the 4.3 Statement Graph two most prominent features of the runtime system: Each operation of an XL Web service is abstractly rep- stream-based processing (Section 4.6) and process mi- resented by a statement graph, similar in spirit to the gration in a cluster (Section 4.7). statement graphs used for data flow analysis and code optimization (see [4]). The nodes of this statement 4.2 XL Core “Algebra” graph are the core statements described above and Although XL is a fairly powerful programming lan- the edges in the graph encode the order in which the guage, all constructs are supported by a very simple statements are to be executed. An edge from state- core algebra composed of six statements. This sim- ment S1 to statement S2 indicates that S1 must be plicity makes XL easy to optimize and to build highly executed before S2. The graph need not be fully con- reliant and secure platforms for XL. In some sense, nected; statements that are not ordered in the graph these six statements can be seen as the pendant to can be executed in parallel. Edges can be annotated the operators of the relational algebra that is used in by Boolean variables that indicate under which condi- order to implement SQL queries. As the operators tion a statement is executed after another statement. of the relational algebra, these core statements can This way, the statement graph encodes if statements be implemented in different ways (e.g., pipelined and and loops. Figure 2 gives a small example. The nodes non-pipelined) and it is the responsibility of the XL of the statement graph are also annotated with addi- optimizer to determine the best way to execute them. tional information like the specific algorithm that has to be used to implement the statement and the esti- 3 • Assignment: As mentioned in Section 3.4, as- mated cost of the statement . signments bind an XL variable to an arbitrary XQuery expression. 4.4 Execution Context Each XL statement is compiled and executed in a con- • Send: This core statement evaluates an expres- text; the context contains the values of variables (e.g., sion and sends the result to the target URI. local variables and conversation-specific variables) and it contains all other information needed to evaluate ex- • Receive: This statement blocks and waits until pressions (e.g., connections and pre-compiled queries). it has received a specific message. The XL statements may need to evaluate XQuery • Wait: This statement blocks until a certain event expressions; such expressions also need an evaluation (e.g., an update to a variable) takes place. context that contains information like the imported schemas, namespace definitions, local functions and so 2The major difference between our token stream and SAX is on [27]. Such contexts can be generated dynamically the fact that SAX events are propagated in a push fashion while our tokens are consumed in a pull mode. 3not shown in figure 2 and are organized in a hierarchy. In such a hierarchy of • Identify local side-effect free service invo- contexts, an XQuery expression is executed in a partic- cations: If side-effect-free invocations are part of ular context; if this context does not contain a specific expressions, it might be possible to avoid these in- definition used in the expression, then the parent con- vocations altogether. Furthermore, the result of text is inspected. The root of the context hierarchy, such an invocation can be cached. the base context, contains the definition of all built-in XQuery types, namespaces and functions [2]. This fea- • If-then-else optimization: First, if the condi- ture is exploited in the XL virtual machine in order to tions of nested if-then-else statements are disjoint, implement scopes of variables and conversations (i.e., the order in which these conditions are evaluated organize the conversational context). can be tuned depending on the selectivities and cost of these conditions [20]. Furthermore, ma- 4.5 Optimization terialization and hashing can be applied to save costs to find the right branch of deeply nested The purpose of the optimizer is to transform the state- if-then-else or switch statements. We expect that ment graph into a more efficient statement graph and evaluating the conditions of swich and if-then-else to annotate the statement graph so that it can directly will take up a substantial part of the cost to ex- be executed. Some of the transformations applied by ecute a Web service, considering trends like the the optimizer are standard techniques from compiler personalization of Web services. construction (e.g., dead code elimination and detec- tion of common subexpressions), some of these trans- This is only an initial, incomplete list of possible formations are known techniques from the database techniques. Moreover, all of the techniques above have camp (e.g., reordering of statements, parallelization, in some way or the other been discussed in the con- and control of data flow and materialization of inter- text of compiler construction, persistent programming mediate results), and some of these transformations languages, or database systems — which optimizations are specific to XL and our particular domain. are particularly amenable and how they interact in our We are currently experimenting with the following specific domain is still subject to future work. All these techniques: optimizations can be carried out very well in XL due to • Common subexpressions: Identify (sub-) ex- the high level of abstraction of the programming lan- pressions in different statements that are guaran- guage and the very simple core algebra and statement teed to produce the same value. This is a classic graph that is used to implement the language. technique studied in compiler construction and in- volves data flow analysis [4]. 4.6 Streaming Execution • Reorder statements: Identify statements Since XL was designed for data-intensive applica- which could be executed in parallel because they tions, the XL virtual machine makes excessive use of do not have any data dependencies. Furthermore, pipelined processing. In fact, the XL virtual machine identify statements that operate on the same data can be characterized as a data flow machine. This in order to improve the temporal locality of the approach is in the database tradition, but it is quite application. different from the way that, e.g., typical Java VMs are designed. Pipelining means that the “results” of one • Multi-query optimization: In certain situa- statement are passed piece by piece (i.e., token by to- tions, it might be better to evaluate two expres- ken) to other statements. For this purpose, each state- sions simultaneously rather than each expression ment is implemented as an iterator, as proposed in [17]. individually. Such situations can be detected In order to understand what pipelining means in XL, in XL because the compiler has a more global we would like to interpret an edge between statements perspective on the application than a traditional S1 and S2 as a set of pipelines (one for each variable) database compiler (i.e. the entire program is anal- through which the values of each variable are passed ysed, not only an isolated query). token by token from statement S1 to S2. Statement S consumes these tokens (if needed) one by one in • Batched updates: Often updates to the same 2 its expressions, and pipes them (if unchanged) to the (or even different) variables can be batched and next statement(s) in the pipeline. executed in one go, thereby saving round-trip Almost all the core statements described in Sec- messages to the database. tion 4.2 can be implemented in such a pipelined fash- • Identify local service invocations: For such ion. The assignment statement, for instance, can be invocations, we could eliminate the (very high) implemented by pipelining the results of the expression overhead of the SOAP protocol. Moreover, the on the right side to the next statements and by car- caller and the callee Web services can be opti- rying out successive inserts if the variable on the left mized together in order to achieve better perfor- side is persistent. The only exceptions from this rule mance. are event and sync which require special attention. The reasons for pipelining in the XL platform are ations and to achieve scalability and high availability essentially the same as in commercial database sys- in a cluster using XL than using Java or J2EE due tems. First, pipelining provides faster initial response to the very simple and clean semantics (core algebra) times; i.e., the system returns the first answers earlier. of the XL programming model. This observation fol- Second, pipelining reduces the main memory require- lows the tradition of relational databases that can be ments and reduces costs to materialize intermediate parallelized very well based on the relational algebra. results (e.g., on main memory or on disk) and/or the However, we still need to prove our intuition. unfortunate swapping. Third, pipelining is a form of parallelism and therefore, potentially reduces the to- 5 Related Work tal response time. Fourth, pipelining is particularly important in the presence of bursts in the network: a The development and composition of Web services (or service can start processing the first inputs while it is e-services) is currently a very active area in both in- waiting for the rest of the input. Finally, pipelining is dustry and academic research. Very good resources very helpful to avoid wasted work or even ensure that that address various aspects of this area are the W3C certain operations terminate in a similar way as lazy workshop on Web services [9] and the latest issue of evaluation does for functional programming. the IEEE Data Engineering Bulletin on e-services [1]. As an example, consider, the following code frag- In the industry, there have been a number of con- ment: crete proposals for new languages and frameworks re- !! infinite input from a sensor lated to our programming language proposal—most le t $a := (1, 0, 1, 0, 1, 0, ... ) prominently, SUN’s J2EE [19] and SunOne [31], and i f ( some $b in $a satisfies $b =1 ) then Microsoft’s .NET initiative [24]. Compaq has devel- le t $output := ”at least one ’1’” ; oped the WebL language [33]; HP has developed the endif eFlow and eSpeak systems [7, 12], IBM is working on a Without pipelining, the execution of the first let language called Web Service Flow Language [36], and statement would not terminate. With pipelining, the Microsoft has recently released their BizTalk Server first 1 will be piped into the if statement which in 2000 [5] and XLang [32]. As mentioned in the intro- turn immediately evaluates to true. As a result this duction, these proposals either fall short to provide operation will simply return the output at least one ’1’ a full programming environment or they are too low- (Existential qualification is very frequent in XQuery level and too complex in order to achieve our main because it is implicit in almost all comparisons. For the objectives: increase the productivity of programmers purpose of clarity, we made it explicit in this example.) and achieve better (automatic) optimization. Of course, pipelining is not always the best execu- Web services are intimately related to workflow sys- tion model and pipelining sometimes comes at an ad- tems. There is of course an extensive literature on ditional cost: overheads for handling individual tokens workflow systems (e.g., [23, 8, 16]). A great deal of and synchronization if several statements consume the the work already done in the area of workflow systems results of a statement. In particular, pipelining must is applicable here, but has to be adapted to the par- be used with great care across iterations of a loop. ticularities of the Web services context. In the academic world, the notion of a service com- 4.7 Process Migration Inside a Cluster position is based on a solid theoretical background con- sisting on the calculi developed first by Hoare [18], Mil- In order to achieve scalability and high reliability, we ner [21] and more recently by Cardelli [6]. However, expect that the platform will be installed on a clus- none of those languages and frameworks are consistent ter of servers. In such a cluster, each server will run with the current XML and Web services state of the a separate XL virtual machine. The XL virtual ma- art, and the proposed approaches to implement those chine has been designed to support the migration of calculi are not designed for data-intensive and scalable processes. To migrate a process (i.e., conversation), applications. simply the context of that conversation needs to be shipped to a different server because the context con- 6 Conclusion tains all the relevant information. Process migration is particularly important in this environment because We presented the design of XL, a new platform for interacting with external, autonomous Web services Web services. XL provides a complete programming (and users) can result in message round-trip times and language that is fully compliant with the Web ser- waiting times of several days. For complex operations, vices paradigm and all related W3C standards (i.e., the virtual machine can also be instrumented to exe- the XML family of standards, SOAP, and WSDL). cute different portions of an activity on different ma- One of the main principles is to provide a very high- chines in parallel, thereby exploiting pipelined and in- level and mostly declarative programming model and dependent parallelism within an operation. Again, we to reduce complexity and the number of software lay- believe that it is going to be easier to parallelize oper- ers (i.e., tiers) in a typical application system. This way, we hope to increase the productivity of program- [14] D. Florescu, A. Gr¨unhagen, D. Kossmann, and mers, make optimization automatic, improve the per- S. Rost. Xl demo. http://xl.in.tum.de/. formance and scalability of application systems. Al- [15] D. Florescu and D. Kossmann. An XML Program- though this has not been the focus of our work so far, ming Language for Web Service Specification and we also hope to make it easier to obtain high availabil- Composition. Technical report, TU Munich, June ity and security / safety using our programming model 2001. and platform. [16] Michael Gillmann, Wolfgang Wonner, and Gerhard The project is still at the beginning. We are cur- Weikum. Workflow management with service quality rently seeking for feedback on the design of the pro- guarantees. In Proc. of the ACM SIGMOD Conf. on gramming language. Furthermore, important concepts Management of Data, Madison, Wisconsin, 2002. such as the transaction model are still open. We plan [17] G. Graefe. Query Evaluation Techniques for Large to fill in these holes as soon as some of the W3C stan- Databases. ACM Computing Surveys, 25(2):73–170, dards (e.g., XML Protocol) have further stabilized. 1993. We have a prototype implementation of the platform [18] C.A.R. Hoare. Communicating Sequential Processes. (see [14]), but a great deal of engineering and exper- Prentice-Hall International, 1985. imentation is still necessary in order to achieve high [19] J2EE. Java 2 enterprise edition . performance and scalability. We are also planning to http://java.sun.com/j2ee/tutorial. carry out pilot projects with customers and their ap- [20] A. Kemper, G. Moerkotte, and M. Steinbrunn. Opti- plications using the XL platform. mizing boolean expressions in object-bases. In Proc. of the VLDB Conf., 1992. References [21] R. Milner. Communication and Concurrency. Pren- [1] Special issue on Infrastructure for Advanced E- tice Hall, 1994. services. Data Engineering Bulletin, 24(1), March [22] XML Protocol; Abstract Model. http://www.w3.org 2001. /TR/xmlp-am/, Jul 2001. [2] XQuery 1.0, XPath 2.0 Functions, and Opera- [23] C. Mohan. Workflow management in the internet tions Version 1.0. http://www.w3.org/TR/xquery- age. Technical report, IBM Almaden Research Center, operators/, August 2002. 2000. [3] XQuery 1.0 and XPath 2.0 Data Model. [24] .NET. http://www.microsoft.com/net. http://www.w3.org/TR/query-datamodel/, Au- [25] The Ptolemy Project . gust 2002. http://ptolemy.eecs.berkeley.edu/. [4] J. Ullman A. Aho, R. Sethi. Compilers, Principles, [26] Simple Object Access Protocol. http://www.w3.org Techniques and Tools. Addison-Wesley, 1988. /2000/xp/Group/, Jun 2002. [5] BizTalk.org. Biztalk initiative . [27] XML Query. http://www.w3.org/XML/Query, Aug http://www.microsoft.com/BizTalk/. 2002. [6] L. Cardelli and R. Davies. Service combinators for [28] R. Rosette. Quality assurarance for web services. Web computing. In IEEE (TSE), 1999. Diploma Thesis, TU M¨unchen, 2002. [7] F. Casati, S. Ilnicki, L. Jin, V. Krishnamoorthy, [29] XML Schema. http://www.w3.org/XML/Schema, and M.-C. Shan. eFlow: a platform for developing May 2001. and managing composite e-services. Technical report, Hewlett Packard, 2000. [30] XQuery 1.0 Formal semantics . [8] Vassilis Christophides, Richard Hull, Akhil Kumar, http://www.w3.org/TR/query-semantics/, August and Jrme Simon. Workflow mediation using vortexml. 2002. IEEE Data Engineering Bulletin, 24(1), 2001. [31] Sun. Sunone. http://www.sun.com/software/sunone. [9] W3C Consortium. Workshop on Web services. [32] S. Thatte. Xlang overview. http://www.gotdotnet. http://www.w3.org/2001/01/WSWS. com/team/xml wsspecs/xlang-c/default.htm. [10] G. Copeland and D. Maier. Making Smalltalk a [33] WebL. Compaq’s web language . database system. In Proceedings of the 1984 ACM http://www.research.compaq.com/SRC/WebL. SIGMOD International Conference on Management [34] WSCI. Web service choreography interface. of Data, pages 316–325. ACM, 1984. http://wwws.sun.com/software/xml/developers/wsci, [11] D. Chamberlin, D. Florescu, et al. Updates for 2002. XQuery. W3C Working Draft, Oct 2002. [35] WSCL. Web services conversation language. [12] eSpeak. The universal language of e-services . http://www.w3.org/TR/wscl10, Mar 2002. http://www.e-speak.hp.com/. [36] WSFL. Web services flow language. http://www. [13] D. Florescu, A. Gr¨unhagen,and D. Kossmann. XL: ibm.com/software/solutions/webservices/pdf/WSFL.pdf. An XML programming language for web service spec- [37] XML Path Language (XPath). http://www.w3.org ification and composition. In WWW2002 Conference /TR/xpath, Nov 1999. Proceedings, 2002.