The Gozer Workflow System
Total Page:16
File Type:pdf, Size:1020Kb
UNIVERSITY OF OKLAHOMA GRADUATE COLLEGE THE GOZER WORKFLOW SYSTEM A THESIS SUBMITTED TO THE GRADUATE FACULTY in partial fulfillment of the requirements for the Degree of MASTER OF SCIENCE By JASON MADDEN Norman, Oklahoma 2010 THE GOZER WORKFLOW SYSTEM A THESIS APPROVED FOR THE SCHOOL OF COMPUTER SCIENCE BY Dr. John Antonio, Chair Dr. Amy McGovern Dr. Rex Page © Copyright by JASON MADDEN 2010 All rights reserved. Acknowledgements I wish to thank my friends and colleagues at RiskMetrics Group, including Nicolas Grounds, Matthew Martin, Jay Sachs, and Joshua Zuech, for their valuable additions to the Gozer Workflow System. It wouldn’t be this complete without them. Thanks also go to the programmers, testers, deployers and operators of Gozer workflows for their patience in dealing with an evolving system, and their feedback and suggestions for improvements. Special thanks go to my manager at RiskMetrics, Jeff Muehring. Without his initial support (following a discussion consuming most of the duration of a late-night flight to New York) and ongoing encouragement, the development and deployment of Gozer could never have happened. Finally, I wish to express my appreciation for my thesis advisor, Dr. John Antonio, for keeping me an the right track and guiding me through the graduate process, and for my committee members, Dr. McGovern and Dr. Page, for their support and for serving on the committee. iv Contents 1 Introduction and Background1 1.1 Before Gozer...............................2 1.2 From XML to Lisp............................4 1.3 Gozer Design Philosophy.........................5 1.4 Gozer Development............................6 1.5 Related Work...............................7 2 The Gozer Language 10 2.1 Syntax................................... 10 2.1.1 The Reader............................ 11 2.1.2 Standard Reader Macros..................... 12 2.2 Evaluation................................. 15 2.2.1 Control Flow........................... 17 2.2.2 Macros............................... 20 2.3 Object Orientation and Data Types................... 22 2.3.1 Functions............................. 23 2.3.2 Java Interfaces.......................... 25 2.3.3 Generic Functions......................... 25 2.3.4 Java Function Façades...................... 29 2.3.5 Dynamic Java Access....................... 30 2.3.6 Common Data Types....................... 34 2.4 Condition System............................. 40 2.4.1 Programmer Interface...................... 41 2.4.2 Implementation.......................... 43 2.5 Standard Library............................. 45 2.6 Development Environment and Tools.................. 46 3 The Gozer Virtual Machine and Compiler 50 3.1 GVM Architecture............................ 50 3.1.1 Image............................... 51 3.2 Function Calling Convention....................... 53 3.3 Bytecode Design............................. 57 3.3.1 Instruction Format........................ 57 3.4 Java Exception Handling......................... 60 v 3.4.1 Nested Interpreters........................ 61 3.5 The Compiler............................... 63 3.5.1 Compilation Phases........................ 64 4 Workflows 70 4.1 Overview.................................. 71 4.1.1 Tasks and Fibers......................... 72 4.1.2 BlueBox.............................. 73 4.2 Distribution................................ 75 4.2.1 Workflow Services......................... 75 4.2.2 Non-Blocking Service Requests................. 77 4.2.3 Deflink............................... 79 4.2.4 Forking Fibers........................... 80 4.2.5 Error Handling.......................... 88 4.2.6 Implementation.......................... 90 4.3 Workflow Patterns............................ 93 5 Conclusion 96 5.1 Contributions............................... 96 5.2 Future Work................................ 97 Appendices A GVM Bytecodes 104 vi List of Listings 2.1 Simplified EBNF-Like Syntax Productions............... 11 2.2 Basic Gozer Syntax............................ 13 2.3 Standard Reader Macros......................... 14 2.4 The Map Reader Macro......................... 15 2.5 Substitution Rule............................. 15 2.6 Block/Return-From............................ 19 2.7 Catch/Throw............................... 20 2.8 Manual Macro Creation......................... 21 2.9 Template Macro Creation........................ 21 2.10 Closures for CONS............................ 24 2.11 Generic Functions Example....................... 28 2.12 Package-based Java Integration..................... 30 2.13 Clojure-style Java Integration...................... 31 2.14 Type Conversion............................. 33 2.15 Sums-of-Squares Variants......................... 39 2.16 Condition Example............................ 43 2.17 Restart-Case Expansion......................... 45 3.1 Required Argument Assembly...................... 54 3.2 Optional Argument Assembly...................... 55 3.3 Keyword Argument Assembly...................... 56 3.4 Complex Keyword Argument Assembly................. 56 3.5 Simple Nested Interpreter........................ 61 3.6 Nested Interpreter With Restort..................... 63 3.7 Mapcar Simplification Compiler Macro................. 67 3.8 Type Inference Examples......................... 68 4.1 Sums-of-Squares Variants......................... 71 4.2 Deflink................................... 81 4.3 Distribution Related Vinz Forms.................... 83 4.4 Vinz Spawn Limit............................. 84 4.5 Using A Task Variable.......................... 87 4.6 Task Variable Reader Macro....................... 88 4.7 Vinz Error Handling........................... 89 vii List of Figures 1.1 Gozer Workflow System.........................7 2.1 Condition Example Call Tree...................... 44 2.2 Conceptual Type Hierarchy....................... 46 2.3 Gozer Debugger and Inspector...................... 48 3.1 GVM Concepts.............................. 52 3.2 Gozer Instruction Format........................ 57 4.1 Process Lifetime.............................. 73 4.2 Sample Workflow Lifetime........................ 78 viii Chapter 1 Introduction and Background The Gozer Workflow System (GWS) is a production workflow authoring and execu- tion platform that was developed primarily by the author at RiskMetrics Group. It provides a high-level language and supporting libraries for implementing local and distributed parallel processes. The GWS was developed with an emphasis on dis- tributed processing environments in which workflows (complex business processes) may execute for hours or even days. Key features of the GWS include: implicit parallelization that exploits both local and distributed parallel resources; survivabil- ity of system faults/shutdowns without losing state; automatic distributed process migration; and implicit resource management and control. The Gozer language is a highly dynamic Lisp dialect designed for the rapid development of complex scripts that can easily exploit a distributed environment as well as local parallelism. Al- though fundamentally object-oriented, it supports multiple programming paradigms, including functional, imperative, object-oriented and generic. The GWS runs on the Java virtual machine and incorporates ideas from languages such as Common Lisp, Scheme, and Java, among others. RiskMetrics Group has more than 150 workflows written in Gozer, many of which execute every day. The Gozer Workflow System has had several major releases and 1 is constantly incorporating new ideas with the intent of making the language easier and more productive for workflow developers. The entire GWS implementation currently consists of approximately 15,000 lines of Java code, and about 10,000 lines in Gozer itself. Work is ongoing, particularly in the areas of improved tool support. 1.1 Before Gozer In late 2006, after determining that no existing commercial workflow systems would meet the needs of RiskMetrics, implementation was begun on a workflow system that was to be tightly integrated with the “BlueBox” distributed computing environment (briefly described in Section 4.1.2). The initial version of this workflow system was heavily influenced by the early BPEL1 specifications, and expressed workflows as XML documents. At this stage, the workflow system was able to interact with BlueBox services, and express distributed parallel computations and distributed loops of identical computations with different data. Running workflows were able to migrate from machine to machine within the BlueBox environment using basic native Java serialization. It supported the map step of the map-reduce paradigm [8]; the reduce step wasn’t directly supported as a loop iteration could produce no values, only side-effects. Service interaction was written into the XML document as an XML template used to produce the service’s input. This XML-based workflow system had no support for variables, only the named values of previous service interactions. Furthermore, there was no support for sub- routines or scoping of the named return values. Workflow documents were intended to express a simple coordination of steps, essentially shuffling data from one service to another. An explicit design choice was to require all control-flow decisions to be 1The Business Process Execution Languge is a standard designed for specifying interactions with Web Services. 2 carried out inside a service; thus, the workflow document had no way to express conditionals. This workflow document, while rigid, met the needs of the first generation of workflow processes,