A Programming Language for Software Components Pdfauthor
A Programming Language for Software Components
Simon D. Kent, B.App.Sci.(Math), B.Inf.Tech.
March 2010
A dissertation submitted in partial fulfillment of the requirements for the degree Doctor of Philosophy in Computer Science
OUT
Faculty of Science and Technology Queensland University of Technology Brisbane, Australia Copyright c Simon D. Kent, MMX. All rights reserved. simon d [email protected]
The author hereby grants permission to the Queensland University of Technology to reproduce and distribute publicly paper and electronic copies of this thesis document in whole or in part. “They say the heat and the flies here can drive a man insane. But you don’t have to believe that, and nor does that bright mauve elephant that just cycled past.”
Terry Pratchett (The Last Continent)
Keywords
Software Components, Programming Languages, Specifications, Interfaces, Modules, State, Re-entrance
Abstract
Component software has many benefits, most notably increased software re-use; however, the component software process places heavy burdens on programming language techno- logy, which modern object-oriented programming languages do not address. In particular, software components require specifications that are both sufficiently expressive and suffi- ciently abstract, and, where possible, these specifications should be checked formally by the programming language. This dissertation presents a programming language called Mentok that provides two novel programming language features enabling improved specification of stateful com- ponent roles. Negotiable interfaces are interface types extended with protocols, and allow specification of changing method availability, including some patterns of out-calls and re-entrance. Type layers are extensions to module signatures that allow specifica- tion of abstract control flow constraints through the interfaces of a component-based application. Development of Mentok’s unique language features included creation of MentokC, the Mentok compiler, and formalization of key properties of Mentok in mini-languages called MentokP and MentokL.
Contents
List of Figures xvii
List of Listings xix
1 Introduction1 1.1 Software Components...... 2 1.1.1 Benefits of Component Software...... 2 1.1.2 Requirements of Component Software...... 3 1.2 Component Specifications...... 5 1.2.1 Component Specification and Programming Languages...... 6 1.3 Research Question...... 6 1.4 Contribution and Approach...... 7 1.5 Structure of the dissertation...... 7
2 Background: Software Components9 2.1 Foundation Technologies...... 9 2.1.1 Object Technology...... 10 2.1.2 Modules...... 14 2.1.3 Cross-Cutting Concerns...... 14 2.1.4 Trust and Safety...... 15 2.2 Industry Component Standards...... 16 2.2.1 Microsoft: COM to .NET...... 17 2.2.2 Java, EJB, J2/Java EE, and Eclipse...... 18 2.2.3 OMG: CORBA to CCM...... 19 2.3 Components and Programming Languages...... 19 2.3.1 Programming languages...... 20 2.4 Components, Specification and Design...... 22 2.4.1 Programming by contract...... 23 2.5 Related Fields...... 25 2.5.1 ADLs and BISLs...... 25 2.5.2 Coordination Languages...... 27 2.5.3 Protocols, Models and Objects...... 28
ix Contents
2.5.4 Model Checking...... 31
3 Scope, Problem and Solution 33 3.1 A Basic Programming Model for Software Components...... 34 3.1.1 Notes on Component Pascal...... 35 3.1.2 Interface Model...... 36 3.1.3 Object Model...... 38 3.1.4 Module Model...... 41 3.2 Problems with the Basic Model: State and Object Consistency..... 42 3.2.1 Static Interface Types: Static Method Availability...... 43 3.2.2 Static Module Signatures: Static Interface Availability...... 49 3.3 Research Question...... 53 3.4 Mentok: Enhanced Specifications...... 53 3.4.1 Negotiable Interfaces: Dynamic Method Availability...... 54 3.4.2 Type Layers: Dynamic Interface Availability...... 60 3.5 Summary...... 65
4 Negotiable Interfaces 67 4.1 Negotiable Interfaces in Mentok...... 68 4.1.1 Negotiable Interface Types...... 69 4.1.2 Objects and Negotiation...... 78 4.1.3 State System...... 84
4.2 A Formal Model of Negotiable Interfaces: MentokP ...... 87 4.2.1 Isabelle and Notation...... 91
4.2.2 MentokP Programs...... 91 4.2.3 Program Well-Formedness...... 94 4.2.4 Type System...... 95 4.2.5 State System...... 98 4.2.6 Operational Semantics...... 102 4.2.7 Results...... 106 4.3 Implementation: Negotiable Interfaces and the Mentok Compiler.... 109 4.3.1 Structure of the Compiler...... 109 4.3.2 Runtime Representations...... 110 4.4 Discussion: Problems and Extensions...... 116 4.4.1 Multiple Behaviours...... 118 4.4.2 Non-determinism...... 120 4.4.3 Reachability and Empty Types...... 121 4.5 Related Work...... 121 4.6 Summary...... 123
x Contents
5 Type Layers 125 5.1 Type Layers in Mentok...... 126 5.1.1 Type and Component Layers...... 127 5.1.2 Composition Model...... 131 5.1.3 Execution model...... 133 5.2 A Formal Model of Type Layers...... 137
5.2.1 MentokL Programs...... 137 5.2.2 Program Well-Formedness...... 138 5.2.3 State System...... 139 5.2.4 Operational Semantics...... 139 5.3 Implementation: Type Layers and the Mentok Compiler...... 140 5.3.1 Structure of the Compiler...... 140 5.3.2 Runtime Representations and Dynamic Checking...... 141 5.4 Discussion: Problems and Extensions...... 143 5.4.1 Reachability...... 144 5.4.2 Recursive Data Structures...... 144 5.5 Related Work...... 145 5.6 Summary...... 146
6 Conclusions and Future Work 147 6.1 Future Work...... 149
A MentokP : A language with negotiable interfaces 151 A.1 Names and Identifiers...... 152
A.2 Types in MentokP ...... 153 A.2.1 Type definitions...... 153 A.2.2 Type tests and functions...... 153 A.3 Primitive Values...... 155 A.3.1 Types of values...... 155 A.3.2 Default values...... 156 A.4 State Declarations...... 157 A.4.1 Declarations...... 157 A.4.2 Valuations...... 157 A.4.3 Well-formedness...... 157 A.4.4 Token Bag partial ordering...... 158
A.5 Expressions and Statements in MentokP ...... 159 A.5.1 Terms...... 160 A.5.2 Predicates and Functions Over Terms...... 161
A.6 Programs in MentokP ...... 165
xi Contents
A.6.1 Method Declarations and Implementations...... 165 A.6.2 Interface Bodies...... 165 A.6.3 Record Bodies...... 166 A.6.4 Programs...... 166 A.6.5 lookups...... 167 A.6.6 Type well-formedness...... 167 A.6.7 Lookup functions...... 168 A.6.8 Negotiable State Well-formedness...... 169 A.7 Type Relationships...... 172 A.7.1 Interface Implementation...... 172 A.7.2 Type widening...... 172 A.7.3 Type results...... 172 A.8 Static Environments...... 174 A.8.1 Environment definitions...... 174 A.8.2 Operations on State Environments...... 174 A.8.3 Environment creation...... 175 A.9 Type System Rules for Expressions and Statements...... 179 A.10 State System Rules for Expressions and Statements...... 182 A.10.1 Stateful parameter checking...... 182 A.10.2 State checking...... 182 A.11 Program Well-Formedness...... 186 A.11.1 Well-formed type lists...... 186 A.11.2 Well-formed field declarations...... 186 A.11.3 Well parameter behaviour lists...... 186 A.11.4 Well-formed method types...... 186 A.11.5 Well-formed method type lists...... 186 A.11.6 Well-formed method declaration lists...... 187 A.11.7 Well-formed interface bodies...... 187 A.11.8 Well formed method bodies...... 187 A.11.9 Well-formed record bodies...... 187 A.11.10Well-formed variable declarations...... 188 A.11.11Well-formed programs...... 188 A.12 Runtime Structures and Operations...... 189 A.12.1 Objects...... 189 A.12.2 Local Bindings...... 190 A.12.3 Operations on local bindings...... 190 A.12.4 Object Store...... 193 A.12.5 Stack Frames...... 193
xii Contents
A.13 Operational Semantics...... 196 A.13.1 Small Step Semantics...... 196 A.13.2 Program Bootstrap...... 199
B Module signatures with type layers 201
C MentokL: A language with type layers 213
C.1 Programs in MentokL ...... 214 C.1.1 Programs...... 214 C.2 Static Environments...... 216 C.3 State System Rules for Expressions and Statements...... 217 C.4 Program Well-Formedness...... 218 C.4.1 Well formed Type Layer decs...... 218 C.4.2 Well-formed programs...... 218 C.5 Runtime Structures...... 219 C.6 Operational Semantics...... 220
Bibliography 221
xiii xiv List of Figures
3.1 Basic interface model...... 37 3.2 Basic object model...... 40 3.3 Basic execution model...... 40 3.4 Module composition model...... 42 3.5 Named buffer state diagram and interface contract...... 44 3.6 Tightly bound re-entrance...... 45 3.7 Application structure defined by components...... 50 3.8 Loosely bound re-entrance...... 51 3.9 Type layer application structure...... 62 3.10 Up layer call...... 63 3.11 Cross layer call...... 63
4.1 Syntax for simple negotiable interface declarations (no subtyping).... 71 4.2 IFoo protocol with reentrant behaviour...... 74 4.3 IBar protocol with cooperative behaviour...... 74 4.4 Syntax for declaration a negotiable interface type (with subtyping)... 75 4.5 Protocols for ISizeable, IHost, and IItem (clockwise from upper left).. 77 4.6 Protocol for IHostableContainer...... 78 4.7 Syntax for implementing methods...... 79 4.8 Syntax for USE...... 80 4.9 Negotiation semantics...... 81 4.10 Local negotiation semantics...... 83 4.11 State checking in action...... 88 4.12 IPlay interface...... 118 4.13 IPlay2 interface...... 119
5.1 Type layer diagram legend...... 128 5.2 Syntax for modules with type layers...... 129 5.3 Entering IApplicationRoot...... 134 5.4 Entering IApplication...... 134 5.5 Entering IDataSource...... 134 5.6 Illegal up layer call to IApplication...... 135
xv List of Figures
5.7 Illegal cross layer call to IPostMessage...... 135 5.8 Negotiation semantics with type layers...... 136
B.1 Type layer graph for module Access...... 201 B.2 Type layer graph for module Data...... 202 B.3 Type layer graph for module WidgetData...... 203 B.4 Type layer graph for module Widgets...... 204 B.5 Type layer graph for module App...... 205 B.6 Minimal type layer graph for module App...... 206 B.7 Type layer graph for module ModBar...... 207 B.8 Type layer graph for module ModFoo...... 207 B.9 Type layer graph for module TypeOrderCycle...... 208 B.10 Type layer graph for module ModOrderCycle...... 209 B.11 Type layer graph for module TypeEquateCycle...... 210 B.12 Type layer graph for module ModEquateCycle...... 211
xvi List of Listings
3.1 Basic interface declaration...... 37 3.2 Basic record declaration...... 39 3.3 Illegal use of named buffers...... 44 3.4 Interface specification for IModel and IView...... 47 3.5 Implementation of IModel...... 47 3.6 Implementations of IView...... 48 3.7 Loosely bound re-entrance...... 52 3.8 Negotiable interface specification of named buffer...... 55 3.9 Preventing simple errors with negotiable interfaces...... 55 3.10 Interface specification for IModel and IView...... 57 3.11 Implementation of IModel...... 58 3.12 Implementations of IView...... 59 3.13 Type layer declarations...... 61 3.14 Loosely bound re-entrance...... 64
4.1 A pair of basic negotiable interface declarations...... 72 4.2 Parent interfaces: IHost, IItem, ISizeable...... 77 4.3 Child interface: IHostableContainer...... 78 4.4 Interface protocols for figure 4.11...... 89 4.5 Code state checked by figure 4.11...... 90 4.6 Basic interface declaration...... 111 4.7 Basic interface declaration...... 112 4.8 Basic interface declaration...... 113 4.9 Mentok class declaration...... 113 4.10 Emitted C# for a Mentok class...... 114 4.11 Mentok: negotiation and parameter passing...... 115 4.12 Emitted C#: negotiation and parameter passing...... 115 4.13 Mentok: Cooperative parameters, local negotiation, and method calls.. 117 4.14 Emitted C#: Cooperative parameters, local negotiation, and method calls.117 4.15 Single behaviour IPlay interface...... 118 4.16 Multi-behaviour IPlay2 interface...... 119 4.17 Non-deterministic IPlay3 interface...... 120
xvii List of Listings
4.18 Choice for non-determinism...... 120
5.1 Signature for module LayeredModule ...... 142 5.2 Custom attributes emitted for LayeredModule...... 142 5.3 Body for module LayeredModule ...... 143 5.4 Emitted code for the body of LayeredModule...... 143 5.5 Reachabilty issues with type layers...... 144
B.1 Signature for layered module Access ...... 201 B.2 Signature for layered module Data ...... 202 B.3 Signature for layered module WidgetData ...... 203 B.4 Signature for layered module Widgets ...... 204 B.5 Signature for layered module App ...... 205 B.6 Signature for layered module ModBar ...... 207 B.7 Signature for layered module ModFoo ...... 207 B.8 Signature for layered module TypeOrderCycle ...... 208 B.9 Signature for layered module ModOrderCycle ...... 209 B.10 Signature for layered module TypeEquateCycle ...... 210 B.11 Signature for layered module ModEquateCycle ...... 211
xviii Statement of Original Authorship
The work contained in this thesis has not been previously submitted for a degree or diploma at any other higher education institution. To the best of my knowledge and belief, the thesis contains no material previously published or written by another person except where due reference is made.
Simon D. Kent March 2010
Acknowledgments
I would like to acknowledge and thank my parents for the help and support they have given me over the years. They provided me with every opportunity but always let me find my own way; I can only hope to be as good a parent to my own children. I would also like thank my extended family for their support and encouragement over the years. I was lucky enough to choose a postgraduate career with the Programming Languages and Systems research group at QUT. The members of PLAS, past and present, were and continue to be a huge influence in my life. Thanks must go to everyone who ever comprised my supervision team: Chris Ho–Stuart, for his time as my principal supervisor, Clemens Szyperski, for his inspiration, and especially Paul Roe, who started me on this path and dragged me to the finish line. I am grateful for the love and support of my friends, who provided me with moments of sanity and controlled amounts of insanity during my studies. In the space that I have, I would especially like to mention Jens Tr¨oger,who I met as a new doctoral student and who has since become one of my closest friends, Dan Lane, my oldest and closest friend, Sime Crossley and Bruce Henderson, my crew. I love you guys. I thank my children, Jessica and Michael, for letting Daddy disappear into his office and for always giving him a hug and a smile. I thank my beautiful wife, Elizabeth, for her support, understanding, tolerance, love, and many cups of coffee. I owe my success to her. I dedicate this thesis to my grandparents: my grandma, Phoebe Kent my granddad, Allan Kent, my nanna, Phyllis Massey, and my poppa, Stanley Massey.
1 Introduction
In the Beginning there was nothing, which exploded. (Terry Pratchett, Hogfather)
In the component software paradigm, software applications are built using contractually specified software components possibly supplied by independent component vendors. The move towards a component based methodology can be seen as a natural progression in soft- ware engineering. In more mature engineering disciplines, electronic engineering being the common example, the use of pre-built components speeds production, increases relia- bility, and allows vendors to specialize. Why should the same not be true for software engineering? The software component approach is attractive for technical reasons apart from the appealing notion that software might be “growing up”. Component based software engineering promises software re-use by design. Software re-use is something of a holy grail for software engineering, promised by each successive new technology but never truly delivered by any. Software components, by definition, are re-usable, composable, software abstractions; that if software is built from components, software re-use should be common place. So why then is component software not already a reality? Component software pro- mises much but it makes many technical demands as well. Software components are not simply bits and pieces that are used to build software systems, software components are also units of deployment. In a traditional software development model, an application is designed, built and tested before it is deployed. Software components are designed, built and deployed in parts and then composed after deployment. The implications of this mo- del are profound. A software component must be designed and tested before deployment, while in an “incomplete” form. A software component must be robust enough to work with other software components that may not even exist at the time the component is conceived. A software component must be composable by a third-party who may not have access to, or the skills to comprehend the source code of the component. A soft- ware component must also be trusted without the consumer personally examining the code for security faults and threats. Until very recently, the state of the art in software engineering technology could simply not deliver on these requirements. A cornerstone of software component technology is component specification. Specifica-
1 1 Introduction
tions can promote good design and guide testing. Specification of component interfaces can provide abstraction for clients and is necessary to enable composition by third parties. Components need specifications but modern programming languages commonly used for component construction and composition provide little in the way of formal support for specification. Design by contract has only recently started making inroads into mains- tream languages and it is still common to rely on the type systems of object-oriented languages to specify the interfaces of software components. As a result, component composition is performed using ad hoc configuration, and object composition. This dissertation describes a programming language called Mentok that contains constructs designed to improve the level of component specification provided by modern programming languages. Rather than focus on those purely functional aspects of a com- ponent specifications that are ably captured by traditional programming by contract approaches, Mentok contains extensions to interface types and module signatures to specify and constrain temporal behaviours, such as object re-entrance.
1.1 Software Components
The definition of a software component as used in this dissertation was formulated at the 1996 Workshop on Component Oriented Software: “A software component is a unit of composition with contractually specified interfaces and explicit context dependencies only. A software component can be deployed indepen- dently and is subject to composition by third parties.”
Component software is fundamentally different to traditional monolithic software, since component software is deployed in parts (called components). This paradigm shift is potentially a source of a great many advantages, but also places a much larger burden on software development tools and processes.
1.1.1 Benefits of Component Software
The benefits of component software have been widely discussed (see Szyperski[Szy02] for an in-depth treatment), and can be summarized as follows:
• Software reuse: Reuse has been a critical theme of software engineering for some time - by reusing software, an application builder need not waste time rein- venting logic. Software reuse is implicit in the component software paradigm. In- dependent deployment and third-party composition promote use of pre-built com- ponents, while contractual specification of components enables their re-use under given requirements.
2 1.1 Software Components
• Encapsulation: An application builder does not need to fully understand the internal logic of the component - only the services that the component provides, which should be a higher-level abstraction of the internal logic.
• Late integration: Components can be integrated into a system after the system has been deployed (e.g. web browser plug-ins). This ability is essential for systems that must utilize new and previously unseen data or functionality.
• Flexibility:The application builder may have a range of implementations of a single component to choose from.
• Adaptability: Components may be replaced and upgraded.
• Lower cost of ownership: The component vendor’s business is the component; therefore, the vendor will need to produce high-quality components with good documentation to survive in a competitive market. The component vendor is likely to be a domain expert, and domain experts will likely produce components with fewer bugs than those produced by non-experts. This reduces the vendor’s maintenance costs.
• Reduced time to market: A corollary of many of the above points is that applications are easier to build which reduces production time.
Not surprisingly, many of these benefits follow directly from the definition of a software component, as given above. A critical part of that definition, upon which all other parts of the definition depend, is contractual specification of components.
1.1.2 Requirements of Component Software
Despite being proposed as long ago as 1968[McI68], software component markets have only just begun to emerge fairly recently. This is not because the component software approach is not desirable, but rather a reflection of the serious technical issues that must be addressed for component software to succeed. Component software mandates a completely new software construction and deploy- ment model: component software is built and deployed in parts and is composed by independent third parties. The implications of this simple statement for software engi- neering are profound, some of which are listed below:
• Software is deployed in an incomplete form. Component deployment occurs before composition into final applications, meaning that software components need to be tested for robustness before they are composed. Furthermore, the standards for robustness are much higher for software components. It is not sufficient for a
3 1 Introduction
component to work in one specific context, as in a monolithic application, but for any context where the component is legally composable.
• Applications are built from higher abstractions. Third party composition implies that the application builder may not be the component builder and there- fore has no a priori knowledge of how the component is implemented. For third party composition to be more effective than simple source code reuse, third par- ties should be able to compose components without knowing how the component is implemented. Software components, therefore, must provide abstraction from implementation but still be usable; languages and tools must support software component abstractions. 1
• Applications contain code from several independent sources. Compo- nents are composed by third-parties, implying that the final application will exe- cute code not written by the application builder. Software engineering process and tools need to provide a model of trust, not only for correctness, but for issues of safety and security.
Add to these “big picture” concerns the many technical requirements of component software, such as a late binding mechanism for extensibility, separate compilation for component substitution, type and module safety for “wiring” components together, and versioning schemes for application life cycle management, and it is little surprise that component software has only recently become a possibility. Even less of a surprise is that most of the enabling advances for software components have come in the field of programming language technology. Programming languages provide a means for delivering powerful abstractions and formal techniques in a form that is both readily available and understandable to software engineers. Module technology, object technology, context-based proxy frameworks, and verifiable type-safety are all vital parts of today’s industrial component frameworks. It is also true, however, that programming languages do not capture all the require- ments of component software. Software components needs specification, and, for these specifications to be useful, they must be expressed and enforced formally, where pos- sible. Programming languages must, therefore, provide abstraction for expressing and consuming component specifications.
1 White-box reuse also prevents easy replacement of components, thus reducing adaptability and flexibility of component applications.
4 1.2 Component Specifications
1.2 Component Specifications
Software components provide abstractions; component vendors build the abstractions and component consumers (application builders) use the abstractions. The abstraction boundary between vendor and consumer is the component specification. Specifications communicate the encapsulated logic of a component to a consumer by describing the services that a component provides without describing the internal workings of the component. Specifications should also communicate the requirements of a component on its environment in order for the component to function safely. Without adequate specifications, third-party composition of components is not practically achievable. Discussion in the literature about what should and should not be in a component specification can be summarized by the following points:
• Specifications must be contractual. For specifications to be useful in a world with third-party composition, specification must be more than a description of a component, but also a binding agreement between the consumer and provider about the obligations of each party. Contractual specifications can serve as a point of agreement in a technical sense, as well as a legal or business sense as well.
• Specifications should be sufficiently formal. Informal specifications are use- ful when providing a description of a component’s services and requirements, but do not provide any guarantees since they cannot be meaningfully checked. For- mal specifications are preferable when it is practical for the specification to be checked[Szy02]. Furthermore, specifications which provide safety guarantee are preferable over those which can only be checked.
• Specifications must be sufficiently expressive. Component specifications should contain functional requirements, provided and required services, progress conditions, dependencies on external resources, and presence or absence of out- calls, as well as non-functional requirements, such as bounds on execution time and space[Szy02].
• Specifications must be sufficiently abstract. Component specifications define equivalence classes for components with respect to substitution. A specification that is too abstract may define an equivalence class with members that are actually incompatible with respect to substitution; such a specification does not provide enough information to enable correct composition for all the members of the equi- valence class. A specification that is too concrete defines an equivalence class with fewer members than could actually be applied in the same context; such a specifi- cation provides too much implementation detail about its component thus overly limiting substitution. White-box (source code) specifications are too concrete as
5 1 Introduction
they prohibit substitution, while black-box specifications of only interfaces and progress conditions are too abstract[BW97].
There are many approaches to specifying components; at the time of writing, howe- ver, there is no widely accepted or adopted approach for contractual specification of components that also meets the requirements of component specification. It is particu- larly noteworthy that most modern programming languages used for construction of components in industry lack any specification mechanism beyond the wiring safety of object-oriented type systems.
1.2.1 Component Specification and Programming Languages
Component specifications are most effective when expressed in and checked by program- ming languages and systems. Use of types and signatures as the basis for component spe- cification helps ensure abstractness. Interface types serve as a unit of specification for individual component roles, while module or component signatures specify composition and replaceability constraints. Static and dynamic checking of elements of component specifications in a languages and runtime system helps satisfy the formal and contrac- tual requirements by guaranteeing or restricting certain properties and behaviours. Most modern programming languages do not satisfy the final requirement of com- ponent specification: expressiveness. Requirements of extensibility, such as discove- rability and polymorphism mean that most component systems are object-based or object-like, but the static type systems of object-oriented languages cannot express tem- poral properties of objects. Progress conditions or software contracts are excellent at capturing purely functional aspects of specification but cannot easily express temporal or intermediate properties of object systems. Finally, interface types and module signa- tures when taken as a whole almost completely fail to capture application architectures or layering.
1.3 Research Question
Components need specification. Programming languages, the primary tool of a program- mer, are a natural vehicles for writing, implementing and checking component specifi- cations. The author argues that the state of the art of component specification in programming languages can be improved upon; the lack of progress towards component software has been attributed elsewhere to shortcomings in the programming languages used to define and integrate components[OZ05]. One particular area of component specification that is largely unaddressed is that of component re-entrance conditions. Szyperski[Szy02] states: “the specification problems
6 1.4 Contribution and Approach encountered in recursive re-entrant systems need to be solved in a modular way to cater for components”. This leads to the research question for this dissertation: Can a component-oriented programming language be extended with constructs that permit specification of re-entrance conditions for components? The component specifications devised should also satisfy the criteria outlined in sec- tion 1.2 above and be contractual, formal, sufficiently expressive and sufficiently abstract.
1.4 Contribution and Approach
This dissertation presents an object-based language called Mentok that permits specifi- cation of re-entrance conditions for components. Mentok is an extension of Component Pascal[Obe97] and builds upon prior research, specifically the investigation of problems of object re-entrance in components by Szyperski[Szy02], and the use of factorable mul- tisets for specification of component interfaces by Puntigam[PP99]. The original research contribution of this dissertation is the creation of two novel programming language features that enable specification of re-entrance conditions for components. Negotiable interfaces extend regular interface types with multiset protocols that allow specification of out-calls and re-entrance patterns. Type layers are extensions to module signatures that allow specification of the abstract control-flow of a component application by means of a partial order over negotiable interfaces; this abstract control- flow can constrain re-entrance across loosely-bound interfaces. The language features have been fully implemented as part of a language called Men- tok. The Mentok compiler, MentokC, was created by extending the Gardens Point Component Pascal (GPCP) compiler[GRC01] and all code samples presented in this dissertation have been compiled and executed using MentokC. Two subsets of Mentok,
MentokP and MentokL, have been created and formalized using Isabelle/HOL[NPW00].
MentokP and MentokL were created to investigate safety properties of negotiable inter- faces and type layers and are not intended to be usable programming languages in their own right.
1.5 Structure of the dissertation
Chapter2 gives a broad overview of background technologies, and related work with reference to its significance to component technology. Readers already familiar with this work can skip to later chapters. Chapter3 describes the target problems motivating this dissertation, particularly the need for specification of stateful components.
7 1 Introduction
Chapter4 presents negotiable interfaces in Mentok, and gives an overview of MentokP , a formalized subset of Mentok, which is presented in AppendixA.
Chapter5 describes type layers in Mentok, and presents MentokL, a subset of Mentok with negotiable interfaces and type layers. Chapter5 builds upon the descriptions of chapter4. Finally, chapter6 gives conclusions and directions for future work.
8 2 Background: Software Components
Everything starts somewhere, though many physicists disagree. But people have always been dimly aware of the problem with the start of things. They wonder how the snowplough driver gets to work, or how the makers of dictionaries look up the spelling of words. (Terry Pratchett, Hogfather)
The concept of software components was first raised by McIlroy [McI68]. McIlroy’s described a market in which programmers build software using commercially available software components built by the vendors of the software component industry. Over 30 years later at the time of this dissertation’s writing, component methodologies, fra- meworks and markets have finally emerged, however, McIlroy’s vision has still not been fully realized. This fact alone gives an indication of the enormity of the advances that were required to enable software technology. To illustrate this point, one needs only to examine one of the modern component frame- works and the technology it is built upon. Sun’s Java and Microsoft’s .NET platforms of- fer reasonably similar component models in EJB/J2EE, and configured .NET assemblies, both of which are mainly targeted at enterprise business solutions. At a glance, both plat- forms utilize language technology involving strong typing, object and module technology, separate compilation, reflection, dynamic loading, verification, versioning, and various libraries, services and abstractions for controlling concurrency, participating in transac- tions, invoking services on objects in other processes or physical locations, and foreign and legacy code interoperability. Under the hood of these frameworks is heavy use of the services offered by modern operating systems. This ignores the exponential growth in hardware speed and storage that have enabled software architects and engineers to build such enormously complex pieces of software.
2.1 Foundation Technologies
The definition of a software component used in this dissertation, as given in chapter1, is as follows:
9 2 Background: Software Components
“A software component is a unit of composition with contractually specified interfaces and explicit context dependencies only. A software component can be deployed indepen- dently and is subject to composition by third parties.” – WCOOP 1996 It is worth noting here that a software component, as a deployable software entity, is an executable piece of software; a software component is not a class, object, or module. That said, software components can be and are implemented using such programming language abstractions. Modules or packages are often compiled into deployable binaries, object-oriented interfaces are used as the basis of component specifications, classes are a convenient abstraction for units of implementation within components, and objects are a unit of instantiation. The following section describes foundation technologies which have contributed to the emergence of component software.
2.1.1 Object Technology
While software components should not be tied to any particular programming paradigm1, most component-oriented programming is done using object-oriented programming lan- guages. While it is true that object-oriented languages are currently the languages of choice for many types of programming, it is also true that object technology has been an enabling technology for software components. Object-oriented concepts like polymor- phism and dynamic binding can be used to program extensible component systems but the process of doing so is not defined by object-oriented languages[FF01b]. Simula I was the first programming language to use the concept of objects, which were data structures with associated operators[Dah02], but it was not until the creation of C++[Str97] that object–oriented programming became widespread. Since then, object- oriented programming has only increased in popularity and modern object-oriented pro- gramming languages such as Java[GJSB05] and C#[Mic00], are being used for increasin- gly more varied tasks, from enterprise systems to embedded devices. Classes and objects are the key concepts of object-oriented programming. Classes are the static specification of objects, and describe the data and operations, called methods, of objects of that class. Classes provide abstraction and encapsulation by separating the class’ external interface, which is accessible to other parts of a program, from the internal details of the class. JavaScript is a notable exception; it has a prototype model of objects rather than classes. Objects are usually accessed by means of a reference or pointer to a memory location. This enables polymorphism and simplifies certain programming tasks (such as event dri- ven programming), it also complicates matters significantly. An object may be referenced, or aliased, from many different parts of a program and may be modified any time an alias is touched. In the absence of alias protection, reasoning about object based sys-
1 Unless it is the component-oriented paradigm!
10 2.1 Foundation Technologies tems is limited. The matter is complicated even further by polymorphism introduced by inheritance.
2.1.1.1 Inheritance
From Simula-67 onwards, all object-oriented languages have a notion of inheritance. There are many forms of inheritance, but broadly speaking, inheritance is used to design relations between classes in object-based systems, by indicating substitutability, refining specifications, or refining classes. In languages where inheritance implies substitutability (polymorphism) dynamic bin- ding is used to determine which code to execute when any of an object’s virtual methods are invoked. Dynamic binding involves looking up the class of the object at runtime to find where the code for the particular method is. In particular, dynamic binding is widely used with interfaces (fully abstract classes) in component object models and is an enabling technology for late-binding. Interface inheritance is used to specify that a new class will implement the interface or contract of an existing base class or interface. Interface inheritance usually implies that the new class will be substitutable for the parent class or interface, in which case it forms the basis for subtyping or subtype polymorphism. Theoretical bases for object- oriented subtyping can be found in examinations of extensible record calculi as given by Cardelli[Car91]. Pierce and Turner[PT94] give a type-theoretical basis for object- oriented subtyping by using existential types to encode purely functional objects. Implementation inheritance, or subclassing, is used to create a new class using the interface and implementation of an old class; the new subclass inherits the code from its parent class, and may add new methods or override the virtual methods of the parent class. In most languages, SmallTalk 80 being an exception, subclassing requires that an object of the subclass is substitutable for an object of the parent class. Mitchell et al[MHF93] present an untyped lamba calculus extended with primitives for defining objects, extending objects with new methods, replacing existing methods on existing objects and sending messages. Mitchell et al then develop a type system that allows methods to be specialized as they are inherited. Inheritance can further be broken down into single and multiple inheritance, for both interface and implementation inheritance. Single inheritance schemes lead to simple acyclic class hierarchies. Multiple inheritance schemes allow for more complex class hierarchies and more polymorphism, but have to deal with a number of syntactic and semantic problems. Languages with multiple interface inheritance must handle namespace clashes when parent interfaces have methods with conflicting names or signatures. Languages that target Microsoft’s CLR, such as C# or Visual Basic .NET, handle namespace clashes
11 2 Background: Software Components
by allowing classes to explicitly implement interface methods, which takes them out of the namespace of the class. Java currently cannot deal with namespace clashes. Languages with multiple implementation inheritance must handle all the problems of multiple interface inheritance as well as handle internal data field namespace clashes, and the so-called diamond inheritance problem. Multiple implementation inheritance is currently out of favour with industry due to the perceived complexity it introduces, although the restricted form that Eiffel[Mey92] provides removes some of these complexi- ties.
2.1.1.2 Interfaces
In one sense, interfaces can be seen as fully abstract classes. This makes interfaces an ideal language level entity for specification of a role or behaviour for a class. Inter- faces are especially important in component-oriented programming as they serve as an implementation independent representation of a role in a component-based system. In- terfaces are therefore an important enabling technology for re-usability, substitutability, and extensibility in component oriented systems. In modern, object-oriented languages that support multiple implementation inheri- tance, interfaces are usually not disambiguated from fully abstract classes, since there is no real semantic difference between them. In languages with single implementation inheritance, interfaces are differentiated from fully abstract classes to enable multiple interface inheritance. Multiple interface inheritance allows enhanced polymorphism by allowing a class to implement many roles, but avoid many of the semantic complexities of multiple implementation inheritance.
2.1.1.3 Inheritance vs. composition
Inheritance as a code reuse technique has been questioned numerous times for varying reasons[Sny86, TGP89, RL89, Mag91, Szy02], and while many of the earliest worries have since been solved by advances in object-oriented languages, technical and practical issues still remain. Taenzer et al[TGP89] compared inheritance with construction for purposes of software reuse in Objective–C. While they conclude that inheritance can reduce the amount of code written, it can also lead to confusing errors when subclassed objects send messages to themselves, leading to self-interference patterns up and down the class hierarchy. They call this the “yo-yo” pattern. Furthermore, some of these problems could not be fixed without breaking encapsulation and inspecting the code of the parent classes. Szyperski[Szy02] expands on the yo-yo pattern problem further and also shows how recursive or re-entrant patterns of method calls are difficult to understand and can break contracts in subtle ways. Szyperski goes on to describe one of the biggest problems with
12 2.1 Foundation Technologies inheritance, the so-called semantic fragile base-class problem 2 The semantic fragile base- class problem occurs when the implementation of a base class changes in such a way as to break its subclasses. Small changes in self-interfering behaviour can easily be breaking changes for subclasses. Mikhajlov and Sekerinksi [MS97] examine the semantic fragile base-class problem and propose a set of constraints to help avoid it; a much simpler solution is to avoid inhe- ritance except in constrained circumstances and favour a construction or composition approach. Composition approaches avoid self-interference patterns, and help prevent de- gradation of abstraction layering in object based systems. Revisiting the early empirical study of Taenzer et al[TGP89], it was noted that where inheritance based reuse often became more complex as the base class increased in size, and that the programmer was often forced to break encapsulation of the base class to fix bugs, construction of a new class using composition did not become more complex, and could be completed without breaking encapsulation to examine the contained class’ code.
2.1.1.4 Mixins
Mixins [BC90] have been proposed as a language concept to aid component-oriented programming [Fla99, Sma99, OZ05]. Mixins are like an interface with an accompanying implementation, and a class inheriting a mixin inherits the mixin’s methods and imple- mentations. Mixins are generally not complete classes, however, and should not be instantiated; rather they are class-to-class transformations, something like functors for modules. Flatt et al show in [FKF98] how extending Java with mixins can increase code reuse in a language with single inheritance, while avoiding some of the problems encountered in multiple inheritance. However, the use of mixins in component-oriented programming is a questionable one, since the use of implementation inheritance means that the problems of the fragile semantic base class and self-interference both apply. Reuse with mixins is usually limited up to compilation since most languages do not allow the inheritance hierarchy to change at runtime. In this sense mixins are just an extension to the type hierarchy and do not enable extensibility or late-binding. XOTcl is one language that allows mixins to be added into a type hierarchy, or into the type of an object at runtime. XOTCL even allows the use of per-object mixins[NZ99] . Per-object mixins may be applied to an object dynamically to give an object access to several supplemental classes. Per-object mixins are inserted into a method chain and can be used in a somewhat similar fashion to composition filters or aspects [GNZ00]. Scala[Ode06] is an object-oriented, functional language that provides mixin functiona-
2 The syntactic fragile base-class problem, while still potentially a problem, is easier to avoid and generally not a problem in mature class libraries.
13 2 Background: Software Components
lity. Traits in Scala are abstract classes that are used as mixins for class composition. Odersky and Zenger[OZ05] show use of mixin composition in the Scala compiler as a case study of language abstractions for extensible and scalable component programming.
2.1.2 Modules
Modules were proposed by Parnas[Par72] as a means of writing specifications for parts of programs, such that other software might use the specified part without any additional information. The defining characteristic of a module is the separation of interface from implementation; this permit separate compilation. Modules may encapsulate several abstractions, such as classes or ADTs, and generally cannot be instantiated.
2.1.3 Cross-Cutting Concerns
Traditional hierarchical modular design seeks to achieve separation of concerns where possible for purposes of abstraction, encapsulation and maintainability. In many systems, there are certain functions or capabilities that cannot be easily factored into traditional hierarchical modular design, and instead cut across several modules, reducing abstraction, and potential for reuse. Cross cutting concerns (CCCs) can also seriously affect composa- blity of software components when components from independent third parties attempt to provide or use services of a CCC. Classic examples of CCCs are system aspects such as concurrency control, transactional support and logging, where often the entire system needs to be inspected to understand or debug this functionality. Aspect Oriented Programming (AOP) [KLM+97] is a design and programming metho- dology that, when combined with an AOP language such as AspectJTM , allows code from cross cutting concerns to be modularized in a programming language construct called an aspect. Aspects contain all the code necessary to implement the functionality of the cross cutting concern, along with information on where the aspect code needs to be woven into the normal modules or classes. Compile-time weaving of cross-cutting concerns is not suitable for component based applications; this would require independent component vendors to use the exact same aspect code for their components in such a way that all components interoperate. A more suitable approach is to use a run-time or interception technique such as composition filters [AT98] or context-based techniques provided by the major component frameworks. Interception techniques work by intercepting method calls or returns to objects (e.g. via lightweight proxies) based on some sort of policy. Once a method call or return has been intercepted, the interception framework can execute the required code to implement the CCC. Interception techniques are less efficient than compile-time weaving, but have the advantage of being configurable without recompilation. Interception techniques are
14 2.1 Foundation Technologies ideal for component application builders who need to configure concurrency control or transaction participation of commercial components. Context based interception frameworks group objects according to policies and enforce those policies whenever a policy (or context) boundary is crossed. The major component frameworks implement most of their component services through context-based intercep- tion techniques. The .NET framework was the first framework to open the interception framework to allow programmers to develop their own contexts. One major problem with AOP and other approaches for modularizing cross cutting concerns is the lack of understanding of what happens when aspects or CCCs are compo- sed. This problem is especially problematic in AOP languages that totally separate the aspect code from the normal code and allow more than one aspect to be woven in at a particular location without warning. Monads [Mog91] have been successfully used in the pure functional programming lan- guage Haskell to model computations with side-effects such as I/O. De Meuter [De 97] noted the similarities between monads in functional languages, which allow modulari- zation of previously non-functional behaviours such as I/O, and aspects. Monads do not suffer from feature composition problems in the same manner as aspects in AOP languages, since the order and location of where monads are applied is visible in the type system of functional languages.
2.1.4 Trust and Safety
In a component-oriented software market, establishing trust between component ven- dors and consumers is vital. While contractual specifications should enable builders of component systems to construct component systems, consumers also need prima facie guarantees that a component will not cause certain errors or access privileged resources. Szyperski [Szy02] notes that trust is a matter of reducing the known to the unknown in a trusted way. Most trust approaches seek to shift the focus of trust from the code distributor to a third party. A simple, but na¨ıve approach to establishing trust is in the use of digital signatures or certificates. Signed components come with a digital signature or certificate that is verified using cryptology and/or the help of a third-party. A consumer needs to trust the signing or certificate process, and also the vendor signing the component. Sandboxing is another technique by which components are instantiated in an environ- ment with restricted privileges. Any attempt by a sand-boxed component instance to access a privileges resource is intercepted and prevented by the environment, usually throwing an error in the process. Only the execution environment and the sandbox policy need to be trusted to use such components. A more formal approach to trust is ensuring safety properties. A binary that is
15 2 Background: Software Components
provably safe in some fashion is guaranteed to abstain from potentially trust-breaking operations. Platforms such as Java and Microsoft’s .NET use verification of type safety to ensure that binaries pass strong type checking, and as a result, abstain from interfering with certain memory regions. Proof carrying code (PCC)[Nec97] is a technique in which along with any code, soft- ware vendors also distribute a safety proof that confirms that the code adheres to some safety policy. The code consumer then needs only to validate the safety proof using a checking algorithm and, assuming the proof is validated, the consumer may be confident that the consumer adheres to the safety policy. With PCC, the consumer needs only to trust the correctness of the checking algorithm to establish trust. A major problem with PCC is that the proofs themselves can be very large compared to the programs they accompany. This is because the code they prove safe is usually machine code, and at a low abstraction level.[Fra03] Code verifiers check the intermediate form of virtual machines, such as the JVM [LY99], to ensure that the semantic gap between the source language and the intermediate form is not exploited[Fra03]. Verification can ensure that a program in an intermediate form is type safe, has legal control flow, and assigns legal values to all variables before they are first used. As such, verifiers can ensure (or at least make it highly likely) that privileged memory is not read or overwritten by rogue components. The focus of trust is shifted from the vendor to the runtime and verifier. Managed exe- cution environments also provide further guarantees by providing bounds checking, and checking caller contexts for privileged library operations. Programs written in an inherently safe code format [ADvRF01, HSF02] do not need to be checked for certain kinds of errors; the code format itself can only express “legal” programs. Inherently safe code formats are based on compressed abstract syntax trees, and are much simpler to verify than typed intermediate codes, but are much more memory intensive on the client side. [Fra03]
2.2 Industry Component Standards
It was not until the early 1990’s that the first widespread use of binary components occur- red. The appearance of Rapid Application Development (RAD) environments, such as Microsoft’s Visual Basic and Borland’s Delphi, allowed programmers to “drag and drop” visual components on to window “forms” to create window applications. Components for such environments could be authored and deployed in an incomplete form, and then composed into applications after deployment. Industry component standards arose soon after, defining models for building and composing non-visual components, to implement
16 2.2 Industry Component Standards business logic and middleware. 3
2.2.1 Microsoft: COM to .NET
Microsoft’s Component Object Model (COM) specifies a wiring standard and deploy- ment model for binary components. Microsoft’s COM is language independent but platform specific, with the only wide-used implementation of COM being in the Win- dows line of operating systems. COM provides a reflection mechanism and allowed polymorphism by allowing a single COM class to implement multiple interfaces. COM was so successful it is now used pervasively throughout the Windows operating systems and the implementation of the CLR. Microsoft’s COM was the first official standard to enforce a freezing policy on publi- shed interfaces. That is, once a COM interface is published, it may not change. New versions of the a published interface may inherit from the old to add new functionality, but if a method name or signature needed to be changed or even removed, an entirely new interface would need to be released. While this ensured COM largely side stepped ver- sioning problems due to changing interfaces, there were still some instances of different versions of components working or not working with different applications. Microsoft’s Distributed COM (DCOM) prescribes an RPC model for COM. DCOM was more often used for RPC on LANs, as opposed to over the Internet for grid computing. COM’s biggest problem was that it was a native binary standard. COM components could run code natively without verification. While this allowed many compilers and frameworks to generate COM components, it does not prescribe that components should be verifiably type safe. “ActiveX” is a branding of COM components for browser plugins and has been vastly successful as a mechanism for distributing binary components in Internet Explorer. The fact that ActiveX components are unverifiable native binaries has proven problematic. Even with a signed certificate model of trust, many ActiveX addins for Internet Explorer are malicious, and ActiveX installations are a prime source of spyware and adware. COM+ is Microsoft’s branding for COM plus enterprise services, and was and is used largely for middleware. COM+ uses what is essentially the same object model as COM, but added an advances interception framework based on contexts, to deliver services such as concurrency control, automatic transactions, JIT activation, etc. as well as loosely bound events and object pooling. COM+ has a more advanced deployment mechanism than COM, with components being deployed as part of COM+ Applications and configured by system administrators using the COM+ Catalog. .NET radically changed Microsoft’s component strategy largely deprecating COM for
3 While it can be argued that applications for operating systems were the first type of software components, such applications were complete and stand alone and not used as parts of other software.
17 2 Background: Software Components
a number of uses, although COM still lives on in many forms. The .NET Common Language Runtime (CLR) followed the example of the JVM in providing a virtual machine with an intermediate executable format that is verifiably type safe. Unlike the JVM, which is designed exclusively for Java (the language), the CLR is touted as a multi-language platform following Microsoft’s plan of single platform, multi-language development. The intermediate language used by the CLR is Common Intermediate Language (CIL) which is strongly typed and object oriented. CIL is compiled to Microsoft’s Portable Executable (PE) form which is then JIT compiled before execution. .NET executables are deployed in the form of “assemblies”, which are DLLs or EXEs with rich metadata describing deployment, version and type information. The COM object model has gone but the COM+ services remain as part of Microsoft’s Enterprise services, and component’s built using these services can be used as part of COM+ Applications. .NET provides a much improved versioning mechanism, which even allows side-by- side execution of different versions of an assembly. This is a vast improvement over all other versioning schemes provided by platforms, largely solving many of the “DLL hell” problems that were so prevalent with Windows. It is worth noting that even as of version 2.0 of .NET, Microsoft itself has encountered version problems with the framework itself, requiring the entire platform to be versioned as a whole, rather than in a componentized, per-DLL fashion. At the time of writing, Microsoft is finalizing a release of the WinFX set of technologies, which includes the Windows Presentation Foundation (code name Avalon), Windows Workflow Foundation, and Windows Communication Foundation (code name Indigo). The WinFX platform is largely an object-oriented set of class libraries, and each new technology provides new object based abstractions (visual trees, workflows, and commu- nication libraries respectively) and application models. WinFX does not prescribe a single application model or composition story above the model of assemblies provided by the .NET platform.
2.2.2 Java, EJB, J2/Java EE, and Eclipse
The Java platform revolutionized web browsing with Java applets and the Java Virtual Machine (JVM). The Java applet model offered many advantages over binary component deployment models, such as ActiveX, as verifiable type safety prevents applets from ac- cessing privileged memory and the JVM can provide fine-grained security control via sandboxing. The Java approach relied on type safety of the Java language [GJSB05], de- monstrating that formal properties of programming languages can have useful application, without being taxing on software developers. The JavaBeans[Sun96] specification was Sun’s first attempt to provide a component
18 2.3 Components and Programming Languages model for GUI widgets in Java. The JavaBeans specification is somewhat ad hoc, as JavaBeans must follow certain naming conventions and implement certain interfaces in order to function correctly. The Enterprise Java Beans (EJB) specification introduced a component model and framework for writing components for enterprise applications. Like JavaBeans, the EJB programming model is still largely based around the implementation of certain interfaces and following certain naming conventions. Enterprise Java Beans are hosted inside an EJB Container, which provides the interception framework and runtime services for each Bean instance. While COM+ prescribes a stateless model for transactional components, EJB differentiates between stateless and stateful components, and allows a component instance to represent some state of a participant in a transaction. EJB uses a passivation/activation serialization scheme to handle the lifetime requirements of stateful beans. The Java Platform Enterprise Edition is the latest incarnation of the Java platform. Based on the Java language, the Java EE is a programming platform for n-tiered applica- tions running in distributed environments. The Java EE defines multiple component-like abstractions, including servlets, portlets, Enterprise Java Beans, and JavaServer Pages.
2.2.3 OMG: CORBA to CCM
The Common Object Request Broker Architecture (CORBA) was created by the Object Management Group to enable interoperation between applications running in heteroge- neous environments. CORBA is language independent, and uses the OMG Interface Definition Language (IDL) to specify the interfaces between interoperating applications. IDL is compiled to stubs and skeletons that ensure calls between clients and objects go through an Object Request Broker (ORB). Historically, CORBA can be seen as the first real wiring standard remote objects from different languages or architectures, but CORBA did not provide a deployment model for software entities until the CORBA Component Model (CCM)[OMG02] in CORBA 3. The CCM is an application framework for CORBA components, and is in many ways a language independent superset of the EJB specification. The CCM prescribes a similar model of containers and objects as EJB, with containers providing an implementation of various component services such as transaction management and concurrency control.
2.3 Components and Programming Languages
Fr¨olich and Franz[FF01b] note that there is broad agreement that “component-oriented programming is good”, but there is much less agreement on “what component-oriented
19 2 Background: Software Components
programming is” and certainly none on “how to do component-oriented” programming.4 Much of this agreement stems from the fact that there is lack of understanding of software components, let alone how to program for and with components; what is clear is that objects and classes, while providing a foundation for component-based programming, are not sufficient by themselves. Advances in programming language systems, such as verifiable type safety and auto- matic garbage collection, have brought about increases in safety and trust, as well as productivity and reliability in component systems. The advent of standards for web services and interoperation in distributed environments has enabled deployment of large, n-tiered componentized applications. The programming languages used to build such systems, typically C# and Java, are still largely object-oriented, and use object-oriented types, such as interfaces and abstract classes, as the basis of specification.
2.3.1 Programming languages
There have been many attempts and suggestions at what a component-oriented language should contain. It is widely accepted that interfaces should be an essential part of a component-oriented language, but there are some problems with type systems of modern object oriented languages that cause problems in component based programming. B¨uchi and Weck make a case for extending the type system of Java to include com- pound interface types. [BW98] Compound interface types are a mixture of structural and name equivalence: structural equivalence is used for checking the set of interfaces that a class implements, but name equivalence is used for checking the types of the individual interfaces. B¨uchi and Weck argue that compound types are necessary in a component-oriented world, where independent vendors may aggregate similar sets of interfaces under different names. Fr¨olich and Franz[FF99] note that modern object-oriented languages such as Java, are unsuitable for component-oriented programming, since they do not adequately support implementation of multiple interfaces in the one class. The problems they describe arise due to way Java merges all methods of parent interfaces into the one class or sub inter- face. Syntactic incompatibility arises when two parent interfaces have methods with incompatible signatures.5 Semantic incompatibility arises when two parent interfaces have methods with the same signature but different contracts. Fr¨olich and Franz de- tach messages from interfaces and instead declare messages at the module level in their Oberon-like language, Lagoona. This means that messages are must be fully qualified by their module name when being called or implemented, disambiguating messages that would otherwise have been incompatible. C#, and the extended Component Pascal
4 A reinterpretation on Leveson’s [Lev86] observation on the why, what and how of software safety. 5 Same method name, argument number and types, but different return type
20 2.3 Components and Programming Languages language used in this thesis achieve much the same thing by allowing interface methods to be explicitly implemented. Explicit implementation removes the method from the namespace of the implementing class, meaning that the interface methods can only be called from reference to an interface (requiring a cast first).
2.3.1.1 Composition Languages and Models
Traditional object-oriented programming languages do not provide good abstract for the common composition operations performed in component based programming. Instantia- tion, event registration, and interface plugging are all usually performed in the logic of programming languages, via methods calls and field assignments. Composition languages seek to make component composition simpler by providing abstractions for components and common operations on components. Some are declara- tive, some treat component as first class entities, and most are closed under composition operations (a composition of components is a component). Composition languages are generally not intended to replace traditional programming languages but assume that atomic components have been created by some other language and conform to a known executable standard. Most composition languages lend themselves well to diagrammatic representation, some of which have formal interpretation (e.g. the category-theoretic diagrams in [FLW03]). It is worth noting that the confusion between components and object is perpetuated in many of these approaches as the languages developed are actually object-composition lan- guages, that use composition techniques regarded as suitable for component-development. Nierstrasz and Meijler[NM95] give several requirements for a composition language. A composition language should:
• treat components as composable, possibly first class abstractions;
• have a standard object model;
• view objects as processes (either active or passive);
• have a type system that can express objects, components and plug compatibility;
• and, finally, be scalable from small to large systems.
Piccola[ALSN00] is a composition language targeting the paradigm of “Applications = Components + Scripts”. Components are viewed as black boxes, with plugs or interfaces for provided and required services, and scripts for “glue” that govern how components interact when adaptation is required. Piccola is based on the πL–calculus and so focuses describing object interactions, rather than static class hierarchies like traditional object- oriented languages.
21 2 Background: Software Components
Bean Markup Language (BML) [CWD00] is an XML-based declarative composition language for JavaBeans. BML provides operations for aggregation (containment), bin- ding events, macro expansion of templates, and recursive composition. BML also allows the inclusion of glue code in compositions to handle compositional mismatch. BML is not closed under composition. The model developed by Costa Seco and Caires in [SC00a] also provides support for containment, aggregation and forwarding. Costa Seco extend their model in [SC00b] with genericity by introducing parameterized component types which allow type informa- tion about composed components to be propagated. The calculus and type system of [Zen02b] provides primitives for forwarding, contain- ment and aggregation. While Zenger’s language extends Featherweight Java, his com- position primitives and specifications are largely orthogonal from the language. Zenger focuses on refinement of the structure of components by allowing any component defi- nition to be refined, or used a prototype, for new components. Zenger also gives a diagrammatic representation for his language extensions to demonstrate component com- position.
2.4 Components, Specification and Design
Third-party composition places a high demand on component specifications. A com- ponent specification must be detailed enough to allow a component to be correctly and usefully composed, but abstract enough to allow a component to be replaced, or used in different contexts. Component specifications need to be contractual in order to ensure quality, and protect component consumers and end-users.[Szy02] Most researchers agree that component contracts should be formally checkable, whene- ver possible. Contracts that can be checked or enforced formally promote quality before a component is deployed. Formal methods for components include (but are not limited to) refinement of specifications, contract languages or language support for programming by contract (c.f. section 2.4.1), strong typing, verification, and model checking. Szyperski [Szy02] believes component contracts should contain both functional pro- perties, such as re-entrance conditions and self-recursive patterns, and non-functional requirements, such as non-functional properties, such as bounds on execution time and bounds on faults. B¨uchi and Weck[BW97] note that while traditional interface specifications sufficiently describe the syntactic requirements of components, traditional pre- and post-conditions are often not sufficient to describe the semantic part of a component. While they agree that white-box specifications, which expose the source code to a user, break abstraction reduce replaceability and reusability, they argue that black box specifications are not
22 2.4 Components, Specification and Design expressive enough, as they do not describe call-back semantics. They argue that com- ponent specifications should be grey-box, which include limited semantic information regarding out calls. B¨uchi and Weck [BW99] develop the idea of grey box solutions further to include callbacks and out calls in Java-like interface specifications. Component specification is also required to enable component adaptation. Component adaptation approaches seek to make components reusable in as many contexts as pos- sible by adapting component interfaces and behaviours to fit new contexts. Black-box approaches to component adaptation such as superimposition [Bos99] are only possible with sufficient syntactic and semantic information. Other approaches to component adaptation, [YS97, Reu01] enhance component specifications with protocols from some process formalism and calculate mismatches in processes to determine a suitable adapter.
2.4.1 Programming by contract
Software contracts use pre-conditions, post-conditions and invariants to express proper- ties of programs and can be expressed in mathematics, informal language or program- ming language structures[MHKM95]. Contracts that are expressed in programming language constructs can be executed at run-time and can be checked when a program is run, such as those in Eiffel[Mey92] and Sather[SO96], while Tran et al[TMA03] have ad- ded support for Design by ContractTM to, Rotor, a multi-language, managed execution environment. Contracts add expressive power to a program. They allow a programmer to clearly record the requirements of a routine and the outcome of the routine. Contracts also help to provide checking and correctness. Parts of a contract can be executed as assertions at appropriate points in the program and any false assertions are detected. The advantages of design-by-contract are stated as follows in [MHKM95]:
• Better designs:
– Designs are clearer since the contract between client and server is made expli- cit.
– Designs are more systematic since programmers are encouraged to think about preconditions etc. since the approach makes this explicit
– Designs are simpler. The defensive nature of a precondition makes the conse- quences of calling a procedure illegally clear. Design by contract therefore discourages the programmer to build procedures with complex requirements resulting in better factored, simpler procedures.
– Inheritance is controlled, since preconditions must not be strengthened in subclasses.
23 2 Background: Software Components
– Exceptions are used more systematically - whenever the contract is broken an exception is raised.
• Improved reliability: – Since requirements are expressed in assertions and code it is more likely that the problem is better understood. – Assertions are tested at runtime leading to better tested code.
• Better Documentation: – Contracts form part of public view of the class. – Documentation is more reliable since the requirements are tested in code as opposed to informally stated. – Contracts provide support for some formal methods.
• Easier debugging: – Bugs are more likely to show up during development because of the assertion mechanism. – Bugs are more easily fixed during maintenance since the assertion mechanism provides a mechanism for communicating more information about a bug loca- ted at runtime.
• Support for reuse: – Assertions can provide feedback to programmers when using third party deve- loped code.
Software contracts suffer from a number of problems such as the inability to express logi- cal quantifiers, the difficulty in expressing some conditions in programming language (lea- ding to the use of informal comments instead), and the necessity of expressing contracts for abstract data types in terms of the class’ properties itself, some of which may not be public. A more serious limitation in the field of object-oriented and component programming is the inability of pre-conditions, post-conditions and invariants to express re-entrance conditions [Szy02] since pre- and post-conditions assume that an operation is atomic. The benefits of programming by contract are certainly applicable to component-oriented programming. Contracts when coupled with the static notion of type give rise to the concept of a behavioural type or dynamic type. Since this program of research includes ad- dressing re-entrance explicitly, pre-conditions, post-conditions and invariants are clearly not sufficient alone to formulate a type system for components.
24 2.5 Related Fields
2.4.1.1 Behavioural subtyping
Behavioural subtyping [LW94] requires that a subtype should never break a program when substituted for its parent type. To ensure behavioural substitutability in class or interface hierarchies in object-based languages, subtypes may only modify a parent’s contract by weakening pre-conditions and strengthening post-conditions. This principle is used in Contract Java[FF01a] to identify cases where pre- and post-conditions may be violated because of behavioural subtype errors in the class or interface hierarchy of a Java program.
2.5 Related Fields
Certain fields in software engineering and computer science share some goals or proper- ties of component software. The following section gives a brief overview of these fields, and their relationship to software components.
2.5.1 ADLs and BISLs
According to Vestal[Ves93], “an [Architecture Description Language] for software appli- cations focuses on the high-level structure of the overall application rather than the implementation details of any specific source module”. There are many approaches to ADLs, from informal description languages and graphical notations, to languages with formal semantics and accompanying tools, such as parsers, compilers, model checkers and code generators [MT97]. ADLs share some of the goals of component-oriented software but many are not strictly component-oriented according to the definition given by Szyperski[Szy02]; the entities that many ADLs describe are not components nor are they closed under composition. ADLs usually focus on connectors 6 or connections between parts of a system and analysis of interactions at the connection points. Some ADLs focus on abstract properties such as liveness while others focus on identifying gauge or probe points to allow instrumentation of systems for performance measurement. Behavioural Interface Specification Languages (BISLs) are used annotate the source code of modules or classes with expressive model-based behaviour specifications. While BISLs provide a richer mathematical language for specification than that of languages that support programming by contract, such specifications can often not be checked at runtime and are used for design and specification purposes only. Examples of BISLs include JML (Java Modeling Language) and the Larch[GH93] family of BISLs.
6 ADL connectors do not map well to any particular system of component based systems, and if they were a software entity, could easily be described as a component.
25 2 Background: Software Components
Allen and Garlan[AG97] show how WRIGHT[All97], an ADL that uses Communica- ting Sequential Processes [Hoa85] as its semantic basis, to analyze architecture descrip- tions. The protocols of the ports of the components and roles and glue connectors in WRIGHT are specified using a subset of CSP, allowing a description to be checked for deadlock freedom. While the method described does not add any semantic power to CSP, Allen and Garlan contend that the WRIGHT ADL provides important abstractions for specifying elements of an architecture, as well as providing encapsulation and distinction between types and instances in a description. Aladdin[SRW98, SW00] is a tool for performing dependency analysis in Commercial- Off-The-Shelf (COTS) components used in existing systems using ACME[GMW97] or Rapide[The97] descriptions. Aladdin uses intra-component pathways, which connect input ports with output ports, to perform dependency analysis of the parts of an architec- tural description in order to determine potential impact of changes to a component of a system. Such specification and analysis techniques share some goals of specification in software components, but are somewhat ineffective if the specifications are not checked and enforced at the programming language level. Inveradi et al [CIW99, IWY00] both describe methods for using CHAM[BB92], a process formalism inspired by analogies in chemistry, to check describe the architecture of systems. A component specification consists of a behaviour specification and a context specification describing the expected behaviour of the environment. Inveradi et al’s method checks that the context specification of each behaviour matches the combined system behaviour. If all expected behaviours match, then the description is global deadlock free, otherwise the system may deadlock. Use Case Maps [de 00b] is an approach developed for component oriented systems. UCM does not have a precise semantics itself but can be combined with BCOOPL [de 00a] interface specifications to allow analysis. BCOOPL is a protocol based, concur- rent, object-oriented language with asynchronous message passing. Analysis allows de- tection of mismatches in the BCOOPL interfaces. Darwin[MDEK94, MK96] is an ADL used to describe distributed systems with dyna- mic structure, in which the organization of components may change during execution. Components in Darwin are modeled as having a set of provides services and a set of re- quires services. Components are bound together via services and applications are built by hierarchical composition of components into composite components; composite compo- nents have the same composition properties as basic components, meaning that Darwin’s model of components closed under composition. Darwin has a formal semantics based on the π–calculus and has been used to model the service view of an architecture, as well as the behavioural view of an architecture [CK96, CGK97, KM98, Mag99]. Behavioral modeling of applications in Darwin use labeled transition systems, and can be used to
26 2.5 Related Fields verify properties of a program, such as safety or liveness. The Koala model[vOvdLKM00] bases its model of components on Darwin, allowing specification of components with requires and provides interfaces, binding of components and hierarchical composition; like Darwin, Koala’s model of components is closed under composition. Koala was created to aid development of embedded software for consumer electronics, and provides facilities for implementing and managing components in the face of diversity of evolution of software. The UML 2.0 specification (UML 2.2[OMG09] being the latest version at the time of writing) introduced a Components package, which allows modeling of “logical” com- ponents, such as business processes, and “physical” components, such as EJB or .NET components. UML component diagrams describe the logical structure of an application, in much the same way as ADLs.
2.5.2 Coordination Languages
Coordination models and languages separate the task of programming into computation (specifically sequential computation) and coordination; a system is said to comprise of a computational part consisting of a number of processes involved with manipulating data and a coordination part responsible for the communication and cooperation between these processes. Carriero and Gelerneter[CG92] suggest that traditional programming languages are incomplete as they only support an ad hoc approach towards coordina- tion. They argue that coordination is best viewed as orthogonal from computation so a coordination language should be separate from a computation language. A coordination model can be viewed as a triple (E,L,M) where E represents the entities being coordinated, L is media on which the entities are bound and M is the semantic framework for binding entities. A coordination language is the embodiment of such a model. [PA98] Coordination models can be classed as data driven (e.g. Linda and its various forms) or control driven (e.g. PCL, Conic). Data driven models provide coordination primitives or mechanisms, which processes use as part of their computation. Control driven models separate coordination from computation by providing a coordination language that is separate from the computation language. Linda[Gel85, CGZ95] was one of the first coordination languages, although technically Linda is a not a complete language itself, but a set of simple coordination primitives. In Linda, if two processes wish to coordinate, the sender process generates a data object called a tuple and places it in a shared data (tuple) space from which receiver can then retrieve it, by means of pattern matching.
27 2 Background: Software Components
2.5.3 Protocols, Models and Objects
There are many approaches that annotate or combine objects with models or protocols for the purpose of programming, typing, and analysis. Such approaches generally focus on modeling and investigating properties of object interactions. Many object-protocol combinations enable modeling of object composition and composition checking, which is very relevant to the field of component oriented programming.
2.5.3.1 Concurrency in Object Oriented Programming Languages
Briot et al[BGL98, BG98] group concurrent object oriented languages into three catego- ries: the library approach, the reflective approach, and the integrative approach. Lan- guages from the library approach provide libraries for structuring and managing concur- rent and distributed programs. Languages from the reflective approach separate normal computation from concurrency control which is described in terms of meta programs or so- called meta-object protocols; in this sense, Aspect-Oriented languages can be thought of as providing a generalized reflective approach. Languages from the integrative ap- proach combine concepts from concurrency with object-oriented concepts. In particular, languages from the integrative approach that use protocols to integrate synchronization with method calls share some similarity to the language extensions described in this thesis. Lock based synchronization, such as in Java or C#, is the most commonly adopted integrative approach to concurrency control in object-oriented languages. Lock based syn- chronization generally associates locks with objects and clients attempt to obtain those locks when entering a synchronized method or code block. Object locks in Java and C# are re-entrant in order to avoid instances of self-inflicted deadlock. In the presence of concurrency, negotiable interfaces can be seen as a generalized lock based synchroni- zation scheme in which locks can be created and destroyed. Negotiable interfaces are not re-entrant by default as a client must be able to demonstrate that it holds the lock before re-entering a method. Eiffel uses a different integrative approach to synchronization and integrates concur- rency control with contracts. The resulting scheme uses pre-conditions for concurrency control and is a very expressive form of guards. Guards associate a boolean condition with a procedure and automatically block or awaken threads based on that boolean condition. Guide [BLR94] is another language that uses guards to control concurrency. Many other concurrent object oriented languages that use protocols for concurrency control are Actor languages[Agh86]. Actor languages have active objects, which are objects integrated with processes; an active object has a unique identity, its own thread of execution, and a message queue and communicates with other objects by sending mes- sages. Actor languages use behaviour replacement synchronization schemes. Incoming
28 2.5 Related Fields messages are accepted according to some behaviour, which calculates a replacement behaviour, which in turn calculates the next incoming message to accept. PROCOL [vL91], TalkTalk [Bd96] and BCOOPL [de 00a] are concurrent object-oriented languages that use path expressions to express synchronization constraints between ac- tive objects (objects with a process, or thread of control). Path expressions are usually regular expression-like and as such give a compact method of specifying object behaviour in terms of synchronization and message passing/processing. Enabled sets or abstract state synchronization techniques such as in ACT++ [KL90] specify a set of potential states of an object and the methods that may be called while the object is in those states. Negotiable interfaces can be seen as an enabled sets approach, although the factorable bag representation makes negotiable interfaces more expressive than most enabled sets approaches. A major problem encountered with integrative synchronization approaches in concur- rent object-oriented languages is the so-called inheritance anomaly. The inheritance anomaly is the name given to a set of problems encountered when attempting to inherit from classes with synchronization constraints; in many cases, classes with synchroniza- tion can only be subclassed if a large amount of their code is rewritten. The inheritance anomaly is actually caused by a number of issues, including a lack of expressiveness in the synchronization scheme,7subclassing without following the guidelines of behaviou- ral subtyping, and “spaghetti” concurrency control code with normal logic (c.f. cross cutting concerns, section 2.1.3).
2.5.3.2 Behavioural Types
Behavioural type systems have been proposed for active objects and, more recently, inter- faces for software components. Approaches to behavioural types use process formalisms or automata to specify the behavioural requirements of objects. Type safety in such models concentrate on ensuring that objects receive messages or method invocations in a correct fashion by checking compositions of objects. While composition level checking of behaviours may be desirable in component based systems, many composition checking approaches based on behavioural types do not scale due to state-space explosion. The type systems of [Nie95] and [Weh00] utilize behaviour descriptions of both client and server objects to examine the combined behaviour of an object and its clients at composition time. Such approaches are not suitable for handling dynamically changing sets of clients or situations in which aliasing occurs as the identity of objects needs to be known statically. The type system presented and developed primarily by Puntigam in [Pun96, Pun97,
7 E.g. Subclassing an “enabled sets” stack class, with states empty, and notempty, to add a method Pop2() which pops two items off the stack requires non-trivial changes.
29 2 Background: Software Components
Pun01] and Puntigam and Peter in [PP99] has a similar scheme for decorating interfaces with (extended) bags of tokens and assigning behaviours to methods. Puntigam and Peter seek to ensure that the sequence of messages received by an object conforms to the (dynamic) type of the object and achieve this by typing individual references to an object. Whenever a reference is passed to another client, the dynamic type of the reference is split between the old and new references. This allows the clients to access services of a server object without querying the object as to its current state; however, if the client wishes to access more services than its reference allows, it must somehow obtain a reference from another client or the server to access those services. This may actually lead to clients becoming coupled, which is an unnatural behaviour for component systems. Finally, the type system developed by Reussner in [Reu00, Reu01], which based on earlier work by Reussner and Heuzeroth in [HR99, RH99], extends Java interfaces with augmented finite state automata. Reussner’s approach decorates each interface with a call automaton that specifies the legal orderings of calls, and each method of a component is with an automaton that describes which other services the method uses. Combining the call and method (or function) automata gives the complete behaviour of a component (called the EC-Automaton). The combined automaton can be checked with behaviour of other components at composition time to accept or reject legal compositions. Reussner’s type system also allows for component adaptation in the case that some required services are not present.
2.5.3.3 Petri Nets
Petri nets have been used extensively to model and analyse concurrent and asynchronous protocols[Pet77]. Petri nets have an execution semantics that can be used to investigate the state-space of a modeled system, permitting analysis of properties such as liveness and deadlock. Higher-level Petri nets, which encode information or data in places, to- kens and transitions, have been used to model objects in concurrent, object-oriented systems [SB94, Lak97, HV01, PG06]. These approaches have relevance to component programming as they focus on specification of object behaviour and enable analysis of object compositions. Lakos[Lak97] describes Object Petri Nets (OPNs), which integrate object-oriented concepts, such as inheritance, polymorphism and dynamic binding, into Petri nets. OPNs can model dynamic method availability through use of guards in transitions and objects can be active or passive. Sibertin-Blanc’s Communicative and Cooperative Nets[SB94] are used to model dyna- mic distributed systems behaviour, in which the set components can change and be recon- figured. Communicative Objects have a close relationship with active objects and commu-
30 2.5 Related Fields nicate via message passing, whereas Cooperative Objects model the method invocation semantics of passive objects using a client-server protocol. Holvoet and Verbaeten[HV01] use Sibertin-Blanc’s Petri nets to specify the protocols of active and passive objects in Ob- jective Linda; the Petri net specifications can be used to define subtyping relationships for objects in Objective Linda. Pettit and Gomaa[PG06] describe a technique for converting concurrent architectures modeled with UML communication diagrams[OMG09] into Coloured Petri Net (CPN) models[KCJ98]. Pettit and Gomaa identify behavioural stereotypes for the objects in an architecture (e.g. asynchronous interface, periodic algorithm, entities) and create a CPN template for each stereotype. A CPM model is constructed from the UML communication diagram by modeling each object conforming to a behavioural stereotype with the appropriate CPN template and then refining the CPN with context specific behaviour. Pettit and Gomaa use their technique to model a cruise control process and then apply design tools to analyze performance characteristics of the resulting CPN.
2.5.4 Model Checking
Model checking is an automated technique for verifying correctness properties of a system[CGL94]; a model checker is a system or program that performs automated model checking. Systems are typically modeled as a finite state transition graphs and a search procedure is used to determine if the system satisfies certain specified temporal properties, such as absence of deadlock or error states. Model checking techniques can certainly be used to check properties of modeled component systems; however, component systems are incomplete by definition and can change after deployment and even at runtime when model checking tools may not be available or may be too expensive to employ. The Labeled Transition System Analyzer (LTSA) [Mag99] has been used to verify properties of systems modeled in Darwin. In Darwin, the behaviour of a primitive component is modeled as a simple finite state automata that communicates and changes state via “actions”; this is shared with many of the approaches to behavioural types for active objects and components in section 2.5.3.2. Composite components are the parallel composition of component behaviours. LTSA can be used to search the state space of a composition to see if certain states, like deadlock, can be reached[MKG97]. LTSA can produce traces to erroneous states and can animate composed behaviour interactively[KM98].
31 32 3 Scope, Problem and Solution
One of the universal rules of happiness is: always be wary of any helpful item that weighs less than its operating manual. (Terry Pratchett, Jingo)
Advances in programming language technology have brought about abstractions that go some way towards enabling software reuse and a component software methodology. Ad- vances in module technology and object technology have helped to enable some features such as separate compilation, substitutability, polymorphism, and extensibility. It is no surprise, therefore, that modern component technology is primarily object-based, and that component specifications are commonly based on interfaces of object-oriented type systems. 1 Object-based systems can be badly affected by object re-entrance, however, and these problems are of particular concern in component software. Object re-entrance is a poten- tially global behaviour, meaning that re-entrance can occur across component boundaries, and arise after component composition and well after deployment. As a result, errors due to re-entrance may be difficult to debug or outright prevent composition of compo- nents. Re-entrance conditions, especially across component boundaries, need to be a part of component specifications. This chapter introduces Mentok, a variant of Component Pascal with novel language features for specifying and constraining object re-entrance. The Mentok compiler, Men- tokC, is based on the Gardens Point Component Pascal (GPCP) compiler[GRC01] and provides a complete implementation of Mentok as presented in this thesis. All code examples in this thesis have been compiled and executed using MentokC. Mentok approaches specification of re-entrance conditions at the interface level by modeling object state and changing method availability. Object re-entrance is a problem when objects are stateful and when stateful objects interact, or are self-interfering. By modeling changing object state and availability of methods as a property of that state in interface types called negotiable interfaces, Mentok enables abstract specification of re-entrance conditions, including out-calls and self-interference patterns, as a property
1 Object-based languages are not strictly necessary to implement software components, but the use of interfaces as units of specification and for composition tend to make component based applications object-like as a result.
33 3 Scope, Problem and Solution
of object state. Mentok also provides a method of specifying re-entrance conditions via relationships between interfaces and between modules called type layers. Type layers enable specifica- tion of control flow in a program as a property of the interfaces used by the program. By restricting cyclic control flow patterns through static and dynamic checking of control flow through type layers, re-entrance can be constrained while still allowing patterns of re-entrance explicitly specified by negotiable interfaces or by polymorphism. Section 3.1 begins by describing the salient features of the programming model assu- med and extended by Mentok. Code snippets presented in this section are presented in Component Pascal, with some common coding conventions and syntactic abbreviations; syntax unique to Mentok or added as extension by the GPCP compiler is described explicitly. Section 3.2 examines some problems that arise when programming components with the model described in section 3.1. Stateful objects and static method and interface availability, allow unchecked re-entrance that can cause runtime errors across component boundaries. Examples and problems of object re-entrance given in this section are motivated by the extensive treatment of the problem by Szyperski in [Szy02]. Section 3.3 states the research questions that this thesis attempts to address. Finally, section 3.4 introduces two novel programming language features of Mentok, negotiable interfaces and type layers that enable specification of changing method and interface availability, and can be used to specify and constrain re-entrance conditions. Negotiable interfaces and type layers are described in greater detail in later chapters.
3.1 A Basic Programming Model for Software Components
Component Pascal is member of the Pascal[Wir71] programming language, and a su- perset of the Oberon-2 programming language. Gardens Point Component Pascal (GPCP)[GRC01] is Component Pascal compiler that targets Sun’s Java Virtual Machine[LY99] and Microsoft’s Common Language Runtime[MR04], and which provides some language extensions to Component Pascal to enable interoperability with the types systems of the JVM and CLR. With the addition of interfaces, GPCP provides a mini- mal set of language features suitable for component based programming: modules as units of deployment and composition, interfaces as units of specification and objects as units of instantiation. The next subsections informally describe the basic type, object and module models that Mentok assumes and extends. This section assumes a basic familiarity with Com- ponent Pascal[Obe97] syntax and semantics; GPCP extensions to the Component Pascal language for interfaces and explicit interface implementation[Mic00] are described in de-
34 3.1 A Basic Programming Model for Software Components tail.
3.1.1 Notes on Component Pascal
Component Pascal[Obe97] is a refinement of the Oberon-2 programming language, de- veloped by Oberon Microsystems. It is a strongly typed, object-oriented programming language with syntax and style of the Pascal family of programming languages. Garden Point Component Pascal (GPCP)[GRC01] is a derivative of Component Pascal, and the GPCP compiler produces binaries for either the CLR[MR04] or the JVM[LY99]. The CLR and the JVM have richer type systems and more language features than Component Pascal, so GPCP has been extended with non-standard syntax to enable interoperability with “native” CLR or JVM binaries (e.g. .NET Framework libraries and Java libraries); additional language constructs are provided for exception handling, enum types, CLR event types, static classes and members, as well as native operators that are not included in the Component Pascal language specification. The salient features of GPCP used by Mentok, interfaces, records, and modules, are described in detail in the following sections. There are also some features of the Com- ponent Pascal language itself that are restricted in Mentok. Implementation inheritance or subclassing is restricted across module boundaries by requiring that all public classes and all classes that implement public interfaces be sealed or final.2 Finally, in this dissertation, many Component Pascal keywords will be omitted for brevity:
• Record attributes, such as LIMITED, ABSTRACT and EXTENSIBLE are omitted, since they are mainly used for controlling implementation inheritance and visibility. A new method attribute, INTERFACE, is introduced for defining interfaces. Interfaces are similar to fully ABSTRACT records, but have different inheritance semantics.
• Method attributes, such as NEW, ABSTRACT, and EXTENSIBLE are omitted, since these are mainly used for designing implementation inheritance. All interface method definitions are assumed to be NEW and ABSTRACT.
• IN, OUT, and VAR parameters are omitted from syntax specifications since Mentok’s language features do not interact in an interesting way with these parameter kinds.
• Language features for defining visibility of declarations are also omitted, since Mentok’s language features do not change Component Pascal’s visibility model. All declarations in code snippets are assumed to be public, unless otherwise noted.
2 This design decision was made as implementation inheritance is considered problematic in component software for reasons outlined in [FF01b, Szy02].
35 3 Scope, Problem and Solution
All examples in this dissertation deal with object (reference) types, as value record and interface types do not have any significant meaning in component based programming. This will be emphasized by using the “POINTER TO Type” form for interface and record declarations.
3.1.2 Interface Model
The model of interfaces used in GPCP and in this dissertation is similar to that used in popular object-oriented programming languages with static, strong typing such as C# or Java. Interfaces are named types, that are indirect[Szy02], a requirement for late-binding and substitutability, and fully abstract, meaning that no implementation is inherited when implementing an interface. Interfaces may inherit from a set of parent interfaces, and classes may implement a set of interfaces. The common naming convention of prefixing interface names with “I” is not enforced in the language, but is used in all examples for clarity. One difference in interfaces as provided by GPCP and those provided by Mentok is that all Mentok interfaces are always explicitly implemented. Explicit implementation is optionally available in C# (as a feature of the CLR’s type system), and enables a class to implement an interface without adding the implemented methods to the namespace of the class. Explicit implementation enables implementation of sets of interfaces that have methods with syntactically clashing signatures, but adds a requirement that interface methods be invoked via an interface reference. Mentok’s type system relies on this property. Listing 3.1 illustrates several interface definitions GPCP syntax for interface declara- tion. As fully abstract, stateless classes, interfaces in GPCP and Mentok share some of the record and procedure syntax of Component Pascal but differ in the following ways:
• Interfaces are declared using record syntax using the new INTERFACE keyword as shown by all four interfaces.
• Interface inheritance is declared by use of the “(+ ...)” syntax immediately following the RECORD keyword. IHostableContainer inherits from three other interfaces, ISizeable, IItem, and IHost. 3
• Interfaces may not have any member variables since they are fully abstract.
3 The + syntax originates from the Gardens Point Component Pascal, where the inheritance declaration of a class or interface is “( BaseClass { ‘‘+’’ Interface ) } ”, but since interfaces cannot inherit from classes, and implementation inheritance is avoided across component boundaries, the initial base class identifier is always omitted.
36 3.1 A Basic Programming Model for Software Components
TYPE ISizeable =POINTERTOINTERFACERECORDEND;
IHost =POINTERTOINTERFACERECORDEND;
IItem =POINTERTOINTERFACERECORDEND;
IHostableContainer =POINTERTOINTERFACERECORD (+ ISizeable + IItem + IHost ) END;
PROCEDURE (this: ISizeable) Size(r:Rect);
PROCEDURE (this: IItem) Host(h:Host);
PROCEDURE (this: IItem) Free(h:Host);
PROCEDURE (this: IHost) Unlock();
PROCEDURE (this: IHost) Lock();
PROCEDURE (this: IHost) HostItem(i:Item);
PROCEDURE (this: IHost) FreeItem(i:Item); Listing 3.1: Basic interface declaration.
Figure 3.1: Basic interface model.
37 3 Scope, Problem and Solution
• Method definitions of an interface are not lexically nested within the record decla- ration, as in Java or C#, but instead are declared with all the other procedures of the application following the type and variable declarations of a module. An interface’s methods are identified by the type of the receiver parameter. In this thesis, the receiver will always be named “this”. Interface methods are always NEW and ABSTRACT in terms of Component Pascal’s modifiers.
Figure 3.1 presents a diagrammatic representation of IHostableContainer and its parent interfaces. This diagrammatic form, which omits some syntactic detail, will be used throughout this dissertation.
3.1.3 Object Model
The object model extended by Mentok (that of GPCP) is common to object-oriented lan- guages such as C# and Java, and is also the basis for component models and frameworks in industry. Objects are units of instantiation and provide services via one or more interfaces. Ob- jects are passive (i.e. they do not have their own process or thread of control), and method invocation is synchronous and blocking. Objects are generally stateful (i.e. ob- jects maintain state for longer than a single atomic method call) and state is manipulated and exposed as the result of calling methods provided by an object. Clients can also test and guard the type of objects, which is useful for discoverability of implemented interfaces in extensible systems. Object type hierarchies are static and type equivalence and subtyping relationships are nominal rather than structural. Object types in Mentok are declared using the RECORD syntax. Listing 3.2 illustrates a simple record declaration for a class called SizeableItem, which implements the interfaces IItem and ISizeable:
• The POINTER TO RECORD identifies SizeableItem as an object type, which will be allocated on the heap.
• The member variable declarations myHost and mySize are lexically nested inside of the record declaration.
• The methods of SizeableItem are declared in lexically separate PROCEDURE blocks and are identified as belonging to SizeableItem by the type of the receiver para- meter, which follows the PROCEDURE token and precedes the method name. By convention, the name of receiver parameter will always be this in this thesis.
38 3.1 A Basic Programming Model for Software Components
1 TYPE 2 SizeableItem =POINTERTORECORD (+ IItem + ISizeable) 3 myHost : IHost; 4 mySize : Rect; 5 END; 6 7 PROCEDURE (this: SizeableItem.IItem) Host(h:IHost); 8 BEGIN 9 ASSERT(this.myHost =NIL); 10 this.myHost := h; 11 END Host; 12 13 PROCEDURE (this: SizeableItem.IItem) Free(h:IHost); 14 BEGIN 15 ASSERT(this.myHost = h); 16 this.myHost :=NIL; 17 END Free; 18 19 PROCEDURE (this: SizeableItem.ISizeable) Size(r:Rect); 20 BEGIN 21 this.mySize := r; 22 END Size; Listing 3.2: Basic record declaration.
The syntax for inheriting interfaces is again the “(+ ...)” syntax, while the syntax for implementing interface methods deserves special note. Note that the type of the re- ceiver parameter in listing 3.2 is SizeableItem.IItem for methods Host and Free and SizeableItem.ISizeable for the method Size. The SizeableItem part of the type denotes that the method belongs to the SizeableItem class. The IItem or ISizeable denotes that the method is provided as part of the implementation of those respective interfaces. This is Mentok’s syntax for explicit interface implementation. Explicit im- plementation allows implementation of multiple interfaces with syntactically clashing signatures (i.e. same method name, parameters, but different return type) or semanti- cally clashing methods (i.e. same method names, parameters, and return types, but different intended semantics or contract). Although this syntax is specific to Mentok, explicit implementation itself is not new or novel, and the syntax is presented here for reference. In Mentok, classes are not permitted to have a default public interface (i.e. classes do not have public methods). Classes, and therefore objects, must provide publicly consumable services via explicitly implemented interfaces. This means that external clients of an object must have an interface reference to an object in order to invoke methods. As noted earlier, the implementation inheritance or subclassing features of Component Pascal are not used in Mentok, as implementation inheritance, especially
39 3 Scope, Problem and Solution
Figure 3.2: Basic object model.
Figure 3.3: Basic execution model.
40 3.1 A Basic Programming Model for Software Components across component boundaries, can lead to problematic patterns of self-interference[FF01b, Szy02] and result in the fragile base class problem described in section 2.1.1.3. Figure 3.2 gives an example of the diagrammatic representation of objects and clients that will be used throughout this dissertation. The object represented in this case is an instance of the record declared in listing 3.2. The interfaces provided by an object are represented by “pins”, to which clients have references. Figure 3.3 demonstrates an activity diagram that will be used to informally illustrate control flow concepts in Mentok. In this example, a client invokes a method through an interface of an object, performs a type test to discover another interface provided by the object, and invokes another method.
3.1.4 Module Model
Modules are units of separate compilation and units of deployment in GPCP, with each module being compiled into a separate binary library. Mentok uses GPCP’s module model as a simple model of components, with composition of modules performed through static compilation, or dynamic loading. Modules may declare and export types and procedures, as in normal Component Pascal. In this dissertation, interface definitions and implementations will commonly be separated into separate modules to model the concept of “interface in the middle”[Szy00]; the language does not proscribe such separation, however, as in Modula-2, Modules will often be termed definition or implementation module, but this is just a convention.4 The loading and deployment model for modules leverages the .NET assembly model. Each module is compiled to a .NET assembly (a self-describing DLL), which can be loaded dynamically and deployed individually. Component composition is achieved statically by importing modules, or dynamically by loading modules via reflection. Separation of definitions from implementations gives the import relationship three different interpretations.
• Definition imports Definition: Interface definition modules can import other definition modules to extend imported interfaces or reference imported interfaces for tightly bound, cooperating interfaces.
• Implementation imports Definition: Implementation modules can import in- terface definition modules to implement types, or import interfaces so that the implementation can use their services.
• Implementation imports Implementation: Implementation modules can im- port other implementation modules to for static composition. The imported mo- dule must now be deployed as part of a composite component.
4 Modules containing simple libraries (e.g. from platform frameworks) are not treated specially.
41 3 Scope, Problem and Solution
Figure 3.4: Module composition model.
The fourth possible interpretation, Definition imports Implementation, breaks abstraction and layering, and so is not considered. Figure 3.4 illustrates an example of module composition for a simple windows application. The application component, composed of the SimpleApp, Window, Panel, and BasicDraw modules, is loaded and run by the atomic AppLoader component. The implementation components import parts the definition component composed by the module App.
3.2 Problems with the Basic Model: State and Object Consistency
The language features and programming model described in the previous section is si- milar, if somewhat simplified, to the basic programming model used to build software
42 3.2 Problems with the Basic Model: State and Object Consistency components in modern component applications and frameworks. There is a disconnect between the amount of specification afforded by static interface types and module signa- tures, and the temporal properties of stateful objects, and running programs consisting of stateful objects. This section examines this gap in specifications, and problems of object re-entrance that can arise as a result. Later sections will show language constructs provided by Mentok that attempt to address these deficiencies.
3.2.1 Static Interface Types: Static Method Availability
Static interface types present a component role as a set of named methods, type constraints on the values that are passed to, and returned by, those methods. This means that static interface types can be used to prevent clients from calling unknown services, or from calling known services with incorrectly typed parameter or return values. Assuming these conditions are met, the methods of a static interface are available to be called at any time by any client. In general, the roles described by interfaces may be stateful. This is not to say that an interface type is not fully abstract, but the role it describes can only be implemented by a component instance or object that maintains, manipulates, and exposes some state across or during method execution. In these cases, it is possible that availability of the methods of an interface depend on the state of the implementing object. Static interface types, however, cannot express the implied state of a component role, nor how the availability of methods changes over time. Figure 3.5 gives a simple example demonstrating the lack of expressiveness of static interface types. The interface, INamedBuffer, is intending to abstractly specify a stateful role, illustrated by the state transition diagram. Initially, a named buffer is closed, and unnamed, meaning that only available method initially is the Name method. Following a call to Name, the buffer can be Opened, and then populated and emptied by sequences of calls to Put and the Get. The Close method is used on an open, and empty buffer and discards the buffer. Listing 3.3 demonstrates two simple cases in which a client misuses an object imple- menting INamedBuffer. The first case, InitializeBuffer, is a simple oversight by the programmer: the buffer is not opened before being populated. The second proce- dure, RenameBuffer is a little more subtle: the contract of the named buffer does not guarantee that a named buffer can be renamed after being closed. (The contract does not prescribe that MyId() should change as a result of being Closed). This oversight may or may not be detected depending on the implementation of INamedBuffer used for testing. Changing method availability, such as in the example above, can often be expressed
43 3 Scope, Problem and Solution
TYPE INamedBuffer =POINTERTOINTERFACERECORD (* Model Variables * * id : Id := NIL * * open : BOOL := FALSE * * slot : Item := NIL *) END;
PROCEDURE (this:INamedBuffer) Name(id : Id); (*Pre: this.id=NIL * *Post:this.id’=id *) PROCEDURE (this:INamedBuffer) Open(); (* Pre: (this.id # NIL) & ~this.open * *Post:this.open’ *) PROCEDURE (this:INamedBuffer) Put(i : Item); (* Pre: this.open & (this.slot = NIL) * *Post:this.slot’=i *) PROCEDURE (this:INamedBuffer) Get() : Item; (* Pre: this.open & (this.slot # NIL) * * Post: (this.slot’ = NIL) & * * return value = this.slot *) PROCEDURE (this:INamedBuffer) Close(); (* Pre: this.open & (this.slot = NIL) * *Post:~this.open’ *) PROCEDURE (this:INamedBuffer) MyId() : Id; (*Pre:TRUE * * Post: return value = this.id *)
Figure 3.5: Named buffer state diagram and interface contract.
PROCEDURE InitializeBuffer(); VAR b : INamedBuffer BEGIN b := NewBuffer(); (*Createbuffer *) b.Name(someID); (*Namebuffer--OK *) b.Put(item); (* Error: Buffer is not open! *)
...
PROCEDURE RenameBuffer(b:INamedBuffer); BEGIN ASSERT(b.open & b.empty); (* Buffer is open and empty *) b. Close (); (* Close buffer -- OK *) b.Name(someID); (* Error: Is the buffer * *stillnamed? *)
... Listing 3.3: Illegal use of named buffers.
44 3.2 Problems with the Basic Model: State and Object Consistency
Figure 3.6: Tightly bound re-entrance. via pre- and post condition contracts. Pre- and post-condition contracts can generally only be checked at runtime, however, meaning that simple errors must be found with a comprehensive testing strategy. This can be somewhat problematic for software compo- nents, which are deployed before composition, and deployed in many different contexts. The problem is exacerbated by the fact that, in the presence of out-calls, method execu- tion is not atomic, and objects can be re-entered.
3.2.1.1 Tightly bound re-entrance
When an object’s method executes, the object may undergo a state transformation. In the general case, method execution is not atomic, as a method may call out to other objects, causing those objects to undergo state transformations. During out-calls, ob- jects may be re-entered by invocation of another method possibly exposing intermediate state. Figure 3.6 gives an activity diagram for a problematic case of re-entrance, as explored by Szyperski[Szy02]. The scenario starts with a change in the Model, which in turn begins to notify its views. When the BrokenView is notified, it objects to the change, and tells the Model to revert the change. This begins a problematic re-entrant call to Set,
45 3 Scope, Problem and Solution
which changes the value and then notifies all its clients of the second change. Finally, the model notifies the second view for a second time about the updated value. This scenario could cause errors in more than one location, but surprisingly, none of the errors will occur in the code of the BrokenView, which is the source of the errors. Listings 3.4,3.5, and 3.6 give source code listings for the IModel and IView interfaces, the implementation for Model, and implementations for BrokenView and View. The first potential problem from the scenario above occurs in the View. The View is told that the key k is updated, only after it has been removed from the Model by the BrokenView. Since the View will probably not be displaying the key in question, the call to remove the key from the View’s display may fail. This is an example of failed inter-object consistency. The Model and all its views were not in a consistent state when the re- entrant call updated the model again, leaving the second view a step behind the other objects. It is also worth noting that the View is updated a second time, as a result of the original Set. If the updated value was sent with the key, the View might show an incorrect value, breaking the contract of IView.Notify. The second problem becomes evident upon examining the contracts for IModel and IView (in listing 3.4). After a call to Set, an object implementing IModel should associate the new value with the updated key. This post-condition is satisfied for the re-entrant call by the broken view, but on completion of the outer call to Set, this post-condition is broken. Scenarios like the above only occur when objects are composed, which, for component- based applications, may be after components are deployed. Fixing such a problem is non- trivial since the code from all cooperating components may need to be inspected. This may be impossible if the source code is not available for the components, or impractical if the application builder is not a domain expert. The problem is further exacerbated since the error is not correctly isolated; the error is detected in one component (the Model) but caused by another (BrokenView). Szyperski goes on to show in [Szy02] how component contracts may be proofed by adding test functions to indicate if an object is in intermediate state. This is entirely possible but requires careful examination of the contracts of all participating interfaces. It also must be done on a case-by-case basis, since the static type systems of object-based languages permit re-entrance by default. Lack of specification of re-entrance conditions for tightly-bound interfaces is clearly problematic. So not only are re-entrance conditions in the form of method availability impossible to express in static interface types, they are also difficult to express correctly using pre- and post-condition contracts. Re-entrance can easily also occur across in interfaces (i.e. interfaces which are de- pendent in an application, but do not explicitly mention each other in their specifications). The next section examines these issues further.
46 3.2 Problems with the Basic Model: State and Object Consistency
TYPE IModel =POINTERTOINTERFACERECORD END;
PROCEDURE (this:IModel) Set(k : Key; v : Value); (*Pre:TRUE * *Post:(ValueOf(k)=v)& * * allviewsarenotified *)
PROCEDURE (this:IModel) ValueOf(k) : Value; (*Pre:TRUE * * Post: contains pair (k,v) THEN result = v * * ELSEresult=NIL *)
...
TYPE IView =POINTERTOINTERFACERECORDEND;
PROCEDURE (this:IView) Notify(k : Key); (*Pre: this.model#NIL * *Post:viewchangedfor * * this.model.GetValue(k) *) Listing 3.4: Interface specification for IModel and IView.
TYPE Model =POINTERTORECORD (+IModel) views: IViewList; table: Table; END;
PROCEDURE (this:Model.IModel) Set(k : Key; v : Value); VAR i :INTEGER; BEGIN this.table.SetValue(k,v); (* set the value *) FORi:=0TO views.Count()-1DO this.views.view(i).Notify(k); (* notify views *) END; END Set;
PROCEDURE (this:IModel) ValueOf(k) : Value; BEGIN IF this.table.Contains(k)THEN RETURN this.table.Value(v); (* return value *) ELSE RETURNNIL; (* return NIL *) END; END ValueOf;
... Listing 3.5: Implementation of IModel. 47 3 Scope, Problem and Solution
TYPE BrokenView =POINTERTORECORD (+IView) model : IModel; display: IDisplay; END;
PROCEDURE (this:BrokenView.IView) Notify(k : Key); VAR newValue : Value; BEGIN newVal = this.model.GetValue(k); IF this.Censors(k)THEN IF newVal =NILTHEN RETURN; (* Censored removed. *) ELSE this.model.SetValue(k,NIL); (* Revert the updated. *) END ELSE IF (newVal =NIL)THEN this.display.Remove(k); (* Remove k from display. *) ELSIF ~this.display.Contains(k)THEN this.display.Add(k); (* Add k to the display. *) ELSE this.display.MoveToEnd(k); (* Add k to the display. *) END; END; END Notify;
...
TYPE View =POINTERTORECORD (+IView) model : IModel; display: IDisplay; END;
PROCEDURE (this:View.IView) Notify(k : Key); VAR newValue : Value; BEGIN newVal = this.model.GetValue(k); IF (newVal =NIL)THEN this.display.Remove(k); (* Remove k from display. *) ELSIF ~this.display.Contains(k)THEN this.display.Add(k); (* Add k to the display. *) ELSE this.display.MoveToEnd(k); (* Add k to the display. *) END; END Notify; Listing 3.6: Implementations of IView.
48 3.2 Problems with the Basic Model: State and Object Consistency
3.2.2 Static Module Signatures: Static Interface Availability
Separation of interfaces and implementations, as is common with component based programming, gives rise to an interesting and potentially problematic issue: breaking of layering. Any interface definition can be statically referenced by a component without introducing cycles in the import order; the principle of specifications in the middle makes the interfaces of an application statically available to any component. In an analogous fashion to static method availability, static interface availability allows any client to make a call through any interface it statically references. Figure 3.7 illustrates this problem. The interface definition modules hint at the inten- ded application structure: a common tiered application architecture, with a presentation layer (specified by the types in module Presentation) calling into a business logic layer (module Logic), which call into the data layer (module Data). The actual application architecture is somewhat different. The application module composes two sub components - module BusinessLogic implements the business logic, and PresentationData which implements both the presentation layer and the data layer.5 So rather than a clear layered architecture, there exists an odd cyclic dependency between two modules, BusinessLogic and PresentationData, possibly leading to cyclic dependencies between loosely coupled interfaces. The logic layer may call into the data layer, which is also the presentation layer, which may call back into the logic layer! Although this is an extreme example, it illustrates how the components of application can alter the layering intended by the specification. The separation of interfaces from implementation makes the definitions available to any component (as long as they import the relevant type library). This is a flaw in the process of programming; the combined specifications of an application may hint at an application architecture but they do not define an architecture. The next section demonstrates more clearly how this can lead to cyclic patterns of re-entrance, even for loosely bound interfaces.
3.2.2.1 Loosely bound re-entrance
The static availability of interfaces can also lead to re-entrance across loosely-bound interfaces. In some cases, this is more of a problem than issues with tightly-bound re-entrance as it can lead to cyclic calling patterns over an entire program. Poorly designed applications, such as the one in the preceding section, will almost certainly have confusing cyclic calling patterns, but there are cases when cyclic calling patterns are desirable so long as they are handled correctly. Figure 3.8 elaborates a possible case of re-entrance for the code snippet in listing 3.7. The programmer has implemented the two interfaces, IPostMessage and IApp in a
5 Since the logic and combined presentation/data modules both reference the interface definitions needed to call into each other, they do not need to import each other.
49 3 Scope, Problem and Solution
Figure 3.7: Application structure defined by components.
50 3.2 Problems with the Basic Model: State and Object Consistency
Figure 3.8: Loosely bound re-entrance.
single class called Application. Again the programmer has decided to add a shortcut to the implementation, by shutting down the application immediately upon detection of a Close message in IPostMessage. The premature halt results in the data source being freed before the data source could be saved, as well as leading to a re-entrant call to deactivate the widget before it has finished handling its last message.
Once again this is a simplistic example, but it demonstrates how the static availabi- lity of interfaces can lead to poorly structured applications. Cases like this are harder to proof against with progress condition contracts since re-entrance has occurred due to dependencies across loosely bound interfaces; it is difficult to formally specify that IPostMessage should not result in calls to other, possibly unrelated, interfaces. Even when specifications themselves are intended to provide a well-structured layered architec- ture, implementations can implement them in vastly different fashions. For component based applications, where specifications are the means for communicating between consu- mers and producers, this is a critical flaw.
51 3 Scope, Problem and Solution
TYPE IApp =POINTERTOINTERFACERECORDEND;
IPostMessage =POINTERTOINTERFACERECORDEND;
PROCEDURE (this:IApp) MsgLoop();
PROCEDURE (this:IPostMessage) Post(m:IMsg); ...
TYPE Application =POINTERTORECORD (+IApp + IPostMessage) window : IWindow; dataSource : IDataSource; msgs : MsgList; END;
PROCEDURE (this:Application.IApp) MsgLoop(); VAR m : IMsg ; BEGIN LOOP IF this.msgs.Count() # 0THEN m := this.msgs.Dequeue(); CASE m.Type()OF (* Close - close the window, free the data source and exit *) | Close : this.window.Close(); this.dataSource.Free(); RETURN; (* Save - commit the data source *) | Save : this.dataSource.Commit(); (* Widget - route message to the destination *) | Widget : m.Dest().Handle(m.MsgPars()); (* ELSE Unknown message - panic! *) END; END; Runtime.Yield(); END; END MsgLoop;
PROCEDURE (this:Application.IPostMessage) Post(m : IMsg); BEGIN ASSERT(m #NIL); IF m.Type() = CloseTHEN (* Shortcut - free everything and shutdown now. *) this.window.Close(); this.dataSource.Free(); HALT(0); ELSE (* otherwise add the message to the queue *) this.msgs.Enqueue(m); END; END Post; Listing 3.7: Loosely bound re-entrance.
52 3.3 Research Question
3.3 Research Question
Section 3.1 demonstrates two related but distinct problems of object re-entrance. Neither problem can be prevented using the specification capabilities afforded by the program- ming language alone:
1. Static interface types allow methods to be invoked at any time, regardless of the state of the implementing objects; object re-entrance across tightly-bound inter- faces cannot be constrained using interface types.
2. Conventional module signatures allow interfaces to be referenced and called regard- less of the state of the program; re-entrance across loosely-bound interfaces cannot be specified and constrained using static dependencies.
These problems lead to the research questions for this dissertation: Can a component-oriented programming language be extended with constructs that permit specification of re-entrance conditions for components? Any approach to answering to the research question should address the two problems identified above and permit specification of re-entrance conditions across tightly- and loosely-bound interfaces. Furthermore, the specifications should attempt to satisfy the criteria outlined in section 1.2 and be contractual, formal, sufficiently expressive and sufficiently abstract.
3.4 Mentok: Enhanced Specifications
Component interfaces need re-entrance condition specifications, but static interface types cannot express re-entrance conditions at the method level and static module si- gnatures re-entrance conditions at the interface level. Mentok was developed to enable specification of both these conditions. The first novel language feature of Mentok is called negotiable interfaces, which at- tempt to solve the problem of specification of re-entrance conditions across method calls of an object. Negotiable interfaces enable specification of stateful component roles by adding a protocol, based on abstract tokens of state, that model changing method availa- bility. Negotiable interface protocols do not treat method execution as atomic, and allow specification of re-entrant or recursive patterns of calls, as well as out-calls. Negotiable interfaces use a combination of static and dynamic checking (the “negotiation” part of negotiable interfaces) to ensure that clients use an object according to its interface protocols, including re-entrance conditions. The second new language feature of Mentok is called type layers , which attempt to solve the problem of specification of re-entrance conditions across interfaces of an
53 3 Scope, Problem and Solution
application. Type layers build upon negotiable interfaces by adding declarative “layer” specifications to the signatures of modules. The type layers of an application’s modules combine to form an abstract control flow specification. The declarative nature of type layers enables a simple composition check to ensure that the interfaces of an application define a truly layered (acyclic) control flow. A combination of static and dynamic checking at negotiation points is then used to constrain re-entrant calling patterns in loosely-bound interfaces by preventing cyclic control flow through interfaces. This section introduces the two new language features by way of simple examples, using the cases from previous section. Negotiable interfaces and type layers are described in depth in chapters4 and5, respectively. The next two sections briefly describes how Mentok’s new language features, nego- tiable interfaces and type layers, with some examples of specifications enabled by these features.
3.4.1 Negotiable Interfaces: Dynamic Method Availability
Negotiable interfaces are an attempt to bridge the specification gap between static in- terface types and stateful roles that interface types can be used to specify. Traditional interface types present stateful roles as stateless, statically available sets of methods, and permit checking of static types at call sites. Negotiable interfaces model stateful roles with abstract state, and changing method availability as a product of that state. As a result, negotiable interfaces enable formal checking of method availability, at call sites. Rather than perform fully static checking, requiring alias analysis or protection schemes, Mentok mixes static checking with dynamic checking, providing a new programming language command, USE, for clients to negotiate with objects. To demonstrate how negotiable interfaces can model simple stateful roles with chan- ging method availability, the INamedBuffer interface from figure 3.5 has been recast into a negotiable interface in listing 3.8. The INamedBuffer negotiable interface defines several different state capabilities, cal- led tokens, which correspond to the states of figure 3.5: Unbound, Named, Empty, Full, and Closed. The initial abstract state of the interface is a single Unbound token, and each method is defined to have a behaviour matching the state transitions given in figure 3.5. For example, the Name method (line4) is specified as requiring a single Unbound token, and producing a single Named token. The errors from listing 3.3 have been recast in listings 3.8 and 3.9, taking advantage of Mentok’s USE statement. Mentok’s USE statement allows a programmer to test and transform the state of an object to a promised state. At the beginning of a USE statement, the request state is negotiated from the object, transformed by the body of the USE statement, and the promised state is returned at the end of the statement.
54 3.4 Mentok: Enhanced Specifications
1 2 TYPE 3 INamedBuffer =POINTERTOINTERFACERECORD 4 STATE Unbound, Named, Empty, 5 Full, Closed; 6 INIT [Unbound]; 7 END; 8 9 PROCEDURE (this:INamedBuffer) Name(id:Id) 10 :: [Unbound]->[Named]; 11 12 PROCEDURE (this:INamedBuffer) Open() 13 :: [Named]->[Empty]; 14 15 PROCEDURE (this:INamedBuffer) Put(i:Item) 16 :: [Empty]->[Full]; 17 18 PROCEDURE (this:INamedBuffer) Get() : Item 19 :: [Full]->[Empty]; 20 21 PROCEDURE (this:INamedBuffer) Close() 22 :: [Empty]->[Closed]; Listing 3.8: Negotiable interface specification of named buffer.
1 2 PROCEDURE InitializeBuffer(); 3 VAR 4 b : INamedBuffer; 5 BEGIN 6 b := NewBuffer(); 7 USE b :: [Unbound] -> [Full]DO 8 b.Name(someID); (* b: [Unbound] -> [Named] *) 9 b.Put(item); (*stateerror-- * 10 * required b: [Empty] * 11 * current b: [Named] *) 12 END; 13 ... 14 15 PROCEDURE RenameBuffer(b:INamedBuffer); 16 BEGIN 17 USE b :: [Empty]->[Named]DO 18 b. Close (); (* [Empty] -> [Closed] *) 19 b.Name(someID); (*stateerror: * 20 * required b: [Unbound] * 21 * current b: [Closed] *) 22 END; 23 ... Listing 3.9: Preventing simple errors with negotiable interfaces.
55 3 Scope, Problem and Solution
The routine InitializeBuffer creates a buffer and then performs a state-check and guard or USE, on line7. The body of the USE statement will attempt to Name and populate the buffer. Mentok’s state system now detects that the named buffer will not be in the correct state on line9. A static state check error is generated and the programmer must correct his or her error. The procedure RenameBuffer throws a similar error when it attempts to name a buffer in a [Closed] state. It is worth noting here that the USE statement, does not just contain a type-test, but also a type promise. In this case, the use statement promises to return the buffer in a [Named] state. Closer inspection of the INamedBuffer interface shows that this behaviour can never be satisfied. That is, the state [Named] is not reachable from an [Empty] state, and Mentok’s state check will reject any body of the USE statement as a result. Simple state-machine like roles can easily be specified using traditional pre- and post- condition contracts, of course, some of which can even be checked statically by model checking techniques. Expressing simple method availability is just a first step toward expressing more complex re-entrance conditions, for which method execution must be treated as non-atomic.
3.4.1.1 Specifying Patterns of Re-entrance
Method invocations are not atomic; objects may make recursive calls or out-calls to other objects which may make re-entrant calls in turn. Mentok allows specification of such patterns of calls using two additional kinds of behaviour:
• Re-entrant behaviours for methods allow specification of patterns of re-entrant or recursive calls for a method.
• Cooperative behaviours allow specification of the sequences of out-calls that a method may make during execution.
These two behaviours can be combined to constrain re-entrance for tightly bound inter- faces. Listing 3.10 gives negotiable interface types for the model/view example from section 3.2.1.1. IModel is declared to have two tokens, update and observe, and an initial state of an update token, and an observe token. The method Set is declared to only be avai- lable when the model is updateable and observable (it consumes [update,observe] but replaces it with [update,observe]), and the method ValueOf is available whene- ver the model is observable. Set is also declared to have a re-entrant behaviour of [observe]->[observe] (line8), meaning that during the course of execution Set, the model will be re-entered by a sequence of method calls that have the composite behaviour of [observe]->[observe] (i.e. zero or more calls to ValueOf).
56 3.4 Mentok: Enhanced Specifications
1 2 TYPE 3 IModel =POINTERTOINTERFACERECORD 4 STATE update,observe; 5 INIT [update,observe]; 6 END; 7 8 PROCEDURE (this:IModel :: [observe]->[observe]) Set(k : Key; v : Value) 9 :: [update,observe]->[update,observe]; 10 11 PROCEDURE (this:IModel) ValueOf(k) : Value 12 :: [observe]->[observe]; 13 14 ... 15 16 TYPE 17 IView =POINTERTOINTERFACERECORD 18 STATE notify; 19 INIT [notify]; 20 END; 21 22 PROCEDURE (this:IView) Notify(k : Key; m : IModel :: [observe]->[observe]) 23 :: [notify]->[notify]; Listing 3.10: Interface specification for IModel and IView.
IView is declared to have only one token type, notify, and an initial state of a single notify token. The method Notify can only be called when the view is notifiable (in this case, when the notify method is not currently executing), and has a cooperative behaviour of the form [observe]->[observe] (line 22) for its model parameter. This means that any implementation of Notify can call into the model, but can only call the method ValueOf.
Listings 3.11 and 3.12 give implementations for model/view negotiable interfaces. The call to IView.Notify on line 20 of listing 3.11 passes the re-entrancy identifier THISINT to the model’s views.6 This helps to explicitly identify out-calls which may result in re-entrance. Also of note, on line 25 of listing 3.12 the error in BrokenView is detected statically as the conditions for re-entering the model are established by the model para- meter’s cooperative behaviour. The burden of dynamic checking for the view is reduced to ensuring that the model parameter passed to Notify is the same as the model being viewed.
Negotiable interfaces can specify some patterns of re-entrance for sets of interfaces that explicitly refer to each other by specification. The next section introduces type layers, which can constrain re-entrance between interfaces that have no explicit relationships.
6 The THISINT keyword is actually syntactic sugar for casting the receiver parameter to type of the explicitly implemented interface. This is covered further in chapter4.
57 3 Scope, Problem and Solution
1 2 TYPE 3 Model =POINTERTORECORD (+IModel) 4 views: IViewList; 5 table: Table; 6 END; 7 8 PROCEDURE (this:Model.IModel :: [observe]->[observe]) 9 Set(k : Key; v : Value) :: [update,observe]->[update,observe]; 10 VAR 11 i :INTEGER; 12 view : IView; 13 BEGIN 14 (* THISINT :: [observe] *) 15 this.table.SetValue(k,v); (* set the value *) 16 FORi:=0TO views.Count()-1DO 17 view := this.views.view(i); 18 USE view :: [notify]->[notify]DO 19 (* THISINT :: [observe] *) 20 view.Notify(k,THISINT); 21 (* THISINT :: [observe] *) 22 END; 23 END; 24 (* THISINT :: [observe] *) 25 END Set; 26 27 PROCEDURE (this:IModel) ValueOf(k) : Value 28 :: [observe]->[observe]; 29 BEGIN 30 IF this.table.Contains(k)THEN 31 RETURN this.table.Value(v); (* return value *) 32 ELSE 33 RETURNNIL; (* return NIL *) 34 END; 35 END ValueOf; 36 37 ... Listing 3.11: Implementation of IModel.
58 3.4 Mentok: Enhanced Specifications
1 2 TYPE 3 BrokenView =POINTERTORECORD (+IView) 4 model : IModel; 5 display: IDisplay; 6 END; 7 8 PROCEDURE (this:BrokenView.IView) 9 Notify(k : Key;m:IModel::[observe]->[observe]) 10 :: [notify]->[notify]; 11 VAR 12 newValue : Value; 13 BEGIN 14 ASSERT(this.model = m); 15 (* m :: [observe] *) 16 newVal = m.GetValue(k); (* m :: [observe] -> [observe] *) 17 (* m :: [observe] *) 18 IF this.Censors(k)THEN 19 IF newVal =NILTHEN 20 RETURN; (* Censored removed. *) 21 ELSE 22 (*TYPEERROR:* 23 * Required :: [update,observe] * 24 * Actual ::[observe] *) 25 m.SetValue(k,NIL); 26 END 27 ELSE 28 ... 29 END; 30 END Notify; 31 32 ... 33 34 TYPE 35 View =POINTERTORECORD (+IView) 36 model : IModel; 37 display: IDisplay; 38 END; 39 40 PROCEDURE (this:View.IView) 41 Notify(k : Key;m:IModel::[observe]->[observe]) 42 :: [notify]->[notify]; 43 VAR 44 newValue : Value; 45 BEGIN 46 ASSERT(this.model = m); 47 (* m :: [observe] *) 48 newVal = m.GetValue(k); (* m :: [observe]->[observe] *) 49 (* m :: [observe] *) 50 IF (newVal =NIL)THEN 51 ... 52 END; 53 (* m :: [observe] *) 54 END Notify; Listing 3.12: Implementations of IView. 59 3 Scope, Problem and Solution
3.4.2 Type Layers: Dynamic Interface Availability
Modules were originally proposed as a means of specifying a “part” of a program, so that other pieces of software could use a module without additional information[Par72]. Modules and their interfaces are now used as a mechanism to specify the architecture of applications and, as programming language abstractions, the specifications provided by modules can be enforced by compilers and other software tools. Layering is a well-known pattern[BMR+96] that groups software abstractions into equivalence classes that have the same sets of dependencies. Layering is a useful architec- tural technique as it helps organize modules into hierarchies and describe the abstract control flow of an architecture. At the programming language level, layering of modules is expressed in module signa- tures as static dependencies; this is generally sufficient for monolithic, non-extensible applications but for component-based applications, which are incomplete by definition, there are no mechanisms for checking and enforcing layering of static dependencies for all configurations of components. This issue is exacerbated by the approach of using interfaces as “specifications in the middle” between components, as components may re- ference a range of statically defined interfaces, and break the intended layering between them. Type layers are an extension to module signatures that allow programmers to spe- cify a layering, a partial order, over modules and negotiable interfaces that specifies abstract control-flow interfaces: method calls from one negotiable interface to another must always go “down-layer”. The specified control-flow is then enforced using two mechanisms:
• A check is performed at USE negotiations to ensure control flow between negotiable interfaces respects the layering. Negotiation checking can be performed both stati- cally and dynamically, depending on the known state of the program.
• A composition check of type layers ensures that no component breaks layering by introducing cycles; cycles make the layering meaningless.
Combined, these two checks ensure that control flow follows the abstract specification defined by the type layers of an application. The composition check ensures freedom from cycles so that layering is meaningful. The control-flow check afforded by negotiable interfaces ensures that the layered control-flow is respected.
60 3.4 Mentok: Enhanced Specifications
1 COMPONENTMODULE App; 2 3 COMPOSE Presentation,Logic,Data; 4 5 COMPONENTLAYER 6 Data < Logic, 7 Logic < Presentation; 8 9 TYPELAYER 10 (* Local layering *) 11 IPostMessage < IApp, 12 IApp < IAppRoot, 13 Presentation.IWindow < IApp, 14 IPostMessage < Presentation.IWindowItem; 15 16 ... 17 18 END App; 19 20 COMPONENTMODULE Presentation; 21 22 TYPELAYER 23 24 IWindowItem < IWindow; 25 26 ... 27 28 END Presentation; Listing 3.13: Type layer declarations.
Listing 3.13 demonstrates a set of layer declarations for the application architecture originally given in section 3.2.2. Line5 give examples of coarse-grained, component layering specifications; such specifications indicate the direction of control flow through the interfaces defined in the layered modules. In the example, the layering specifies that the Logic module is layered above the Data module, meaning that the components implementing interfaces from the Logic module may call into interfaces defined by the Data module; components implementing Data interfaces cannot call into Logic interfaces. The layering relationship is transitive, so the Presentation module is the Data layer as well as the Logic layer. Line9 gives examples of fine-grained layering of individual interfaces. Fine-grained type layers specify a control-flow dependency between interfaces. For example, the in- terface IPostMessage from the Application module is layered below the IWindowItem interface from the Presentation module. The intent behind this layering is as follows: a window item can post a message to the application message queue, but the message must be handled asynchronously to allow the window item to finish its current method
61 3 Scope, Problem and Solution
Figure 3.9: Type layer application structure.
call without being re-entered. Type layers may be used to visualize the abstract architecture of a component or application. Programmers can use type layer visualizations as an aid when learning or designing a new system. Figure 3.9 is actual output from MentokC for the code sample in listing 3.13.7 The layered design of the application is visualized, with the Presentation module layered above the Logic module, which in turn is layered above the Data module. The Application module provides two types, IAppRoot and IApp, that are placed at the top of the layering, and another, IPostMessage, that is placed below some Presentation
7 Some detail has been hand trimmed from this diagram for brevity.
62 3.4 Mentok: Enhanced Specifications
Figure 3.10: Up layer call.
Figure 3.11: Cross layer call. layer types. This reflects the ability of the Presentation layer to post work items to an application message queue, and the guarantee that posting a message will never result in a call-back into the Presentation layer.
3.4.2.1 Constraining Control Flow and Re-entrance
Section 3.2.2.1 examined an erroneous case of re-entrance that occurred between objects implementing loosely-bound interfaces. The error occurred because a layering between types (IPostMessage and IDataSource) was broken. Type layers can be used to specify such constraints and, being part of the type system of a programming language, can be checked. Type layer checking ensures that program control flow follows the abstract design of type layers: a program can only call “downwards” through the layering at any given time. Note that it is still possible for an object to implement types from different layers, which means that object re-entrance is still possible; however, patterns of cyclic calls back to interfaces from higher layers are prevented. Type layers constrain the availability of the interfaces declared by a program and thus constraining the actions that a re-entrant call may initiate. Listing 3.14 revisits the problem from section 3.2.2.1, using the type layering from figure 3.9. The Post method is prevented from initiating cyclic calls to the window (line 41, illustrated in figure 3.10) and from calling across layers to close the data source
63 3 Scope, Problem and Solution
1 TYPE 2 IApp =POINTERTOINTERFACERECORD 3 STATE run,end; 4 INIT [run]; 5 END; 6 7 IPostMessage =POINTERTOINTERFACERECORD 8 STATE post; 9 INIT [post]; 10 END; 11 12 PROCEDURE (this:IApp) MsgLoop() :: [run]->[end]; 13 14 PROCEDURE (this:IPostMessage) Post(m:IMsg) :: [post]->[post]; 15 16 ... 17 18 TYPE 19 Application =POINTERTORECORD (+IApp + IPostMessage) 20 window : IWindow; 21 dataSource : IDataSource; 22 msgs : MsgList; 23 END; 24 25 PROCEDURE (this:Application.IPostMessage) Post(m : IMsg) 26 :: [post]->[post]; 27 VAR 28 d: IDataSource; 29 w: IWindow ; 30 BEGIN 31 (* Layer = IPostMessage *) 32 ASSERT(m #NIL); 33 IF m.Type() = CloseTHEN 34 (* Shortcut - free everything and shutdown now. *) 35 w := this.window; 36 (*LAYERERROR:* 37 * Requested : IWindow * 38 * Actual : IPostMessage *) 39 USE 40 w :: [active]->[inactive]DO this.window.Close() 41 END; 42 d := this.dataSource; 43 (*LAYERERROR:* 44 * Requested : IDataSource * 45 * Actual : IPostMessage *) 46 USE d :: [bound]->[free]DO this.dataSource.Free()END; 47 HALT (0); 48 ELSE 49 (* otherwise add the message to the queue *) 50 this.msgs.Enqueue(m); 51 END; 52 END Post; Listing 3.14: Loosely bound re-entrance.
64 3.5 Summary
(line 46, illustrated in figure 3.11). In both cases, it is possible to statically detect that the Post method is attempting to call up or across layers. If the current layer is not known statically, type layers can be tested and guarded by extending the semantics of USE to include a layer check ensuring that the dynamic call ordering by layers is respected.
3.5 Summary
Object based approaches to component programming are affected by problems of re- entrance and inter-object consistency. The static type systems of object based and object oriented languages do not allow specification and constraint of behaviours leading to such problems, however. Mentok provides features that allow specification and constraint of control flow through objects, namely negotiable interfaces and type layers. The next two chapters describe, in detail, negotiable interfaces and type layers in Mentok.
65 66 4 Negotiable Interfaces
That statement is either so deep it would take a lifetime to fully comprehend every particle of its meaning, or it is a load of absolute tosh. Which is it, I wonder? (Terry Pratchett, Hogfather)
Traditional object-oriented interfaces are the basis for component specifications but can only express and enforce simple wiring safety properties, such as preventing “message not- understood” errors and ensuring data values are passed to and from method invocations. This level of specification is insufficient for stateful component roles, correct implemen- tation of which requires any given implementation to encapsulate and maintain state during and between method calls. Consistency of this state often relies on correct inter- actions between clients and objects, and when method execution is non-atomic object re-entrance can expose intermediate state to clients. Component oriented programming has a different deployment paradigm and requires a new programming discipline. Component applications are extensible, and thus incom- plete by definition, meaning that component instances, or objects, cannot necessarily be tested in composition before deployment. Object re-entrance is a global system pro- perty, however, meaning that it can occur across component boundaries. Specification of components must, therefore, include specification of re-entrance conditions to enable a component to be proofed against re-entrance before deployment. This chapter presents negotiable interfaces, a language feature of Mentok that enables specification of some object re-entrance conditions. Negotiable interfaces in Mentok have been fully implemented by MentokC, the Mentok compiler. This chapter also presents
MentokP , a subset of Mentok that was created and formalized using Isabelle/HOL[Wen00,
NPW00]. MentokP models enough of Mentok to enable investigation of the safety properties of negotiable interfaces but is not intended to be a usable programming language itself. Negotiable interfaces are enhanced interface types that allow specification of method availability as a product of temporal object state, including out-calls and re-entrance. Negotiable interfaces have a dynamic representation that can be tested at runtime via a mechanism called negotiation. A static state checking step ensures that clients of Mentok objects respect the method availability protocols specified by negotiable interfaces to
67 4 Negotiable Interfaces
help prevent errors that may otherwise occur after deployment. The next section gives a informal description of the syntax, semantics, and type system features of negotiable interfaces, as implemented in Mentok. Section 4.2 describes a
formalization of a subset of Mentok called MentokP , which was created to investigate safety properties of negotiable interfaces. Section 4.3 discusses how the Mentok compiler, MentokC, and the Mentok runtime libraries implement negotiable interfaces. Section 4.4 discusses some problems with and extensions to negotiable interfaces in Mentok, and while section 4.5 discusses related work. Section 4.4 discusses some problems with and extensions to negotiable interfaces in Mentok, and while section 4.5 discusses related work. This chapter assumes a basic familiarity with Component Pascal syntax, as well as shorthand and concepts described in section 3.1.
4.1 Negotiable Interfaces in Mentok
Negotiable interfaces in Mentok can be broken down into four pieces: negotiable interface types, negotiable objects, dynamic negotiation, and static state checking.
1. Negotiable interface types are object-oriented interfaces extended with an pro- tocol that specifies method availability based on abstract state. Unlike other pro- tocol based approaches to specifying component behaviours, negotiable interface protocols do not treat methods as atomic operations, and actually include explicit notions of out-calls and re-entrance.
2. Negotiable objects are objects that provide operations via negotiable interfaces. At runtime, the abstract state based portion of an object’s negotiable interface type or types is reified. This abstract state is transformed as clients perform operations via the object’s negotiable interfaces.
3. Negotiation is a mechanism provided by the Mentok USE statement that allows clients to perform complex operations with negotiable objects. A negotiation begins with a request and promise by a client to perform a certain set of operations on a negotiable object. If the negotiation succeeds, the client is guaranteed to be able to perform those operations, otherwise the client is given no guarantees. Clients are statically prevented from performing illegal operations, (i.e. operations on an object without a guarantee), by Mentok’s state system.
4. Mentok’s state system, an extension to traditional static type systems, statically models the state of references to negotiable objects to check client code of nego- tiable interfaces. State checking statically rejects any program in which a client
68 4.1 Negotiable Interfaces in Mentok
attempts to perform an operation on an object when the state of the object is unknown, or is known to be different to the required state of the operation. State checking also ensures that clients of objects satisfy the obligations of negotiation, as well as re-entrance and out-calls as specified by negotiable interface types.
Subsection 4.1.1 introduces the concepts, syntax and diagrammatic representation of negotiable interface types, subsection 4.1.2 discusses negotiable objects and negotiation, and subsection 4.1.3 describes the Mentok’s state system.
4.1.1 Negotiable Interface Types
Negotiable interface types are interfaces enhanced with an abstract state-based protocol. That is, the temporal behaviour of a negotiable interface is modeled by a dynamically changing abstract state that models the availability of methods. The methods of a negotiable interface in turn specify how the abstract state of the interface is modified. In addition to specifying how an operation changes the state of an interface as a result of execution, negotiable interface methods can also specify self-interference patterns and out-calls that occur during the course of execution. This allows a programmer to declaratively specify complex interactions between objects. The abstract state of a negotiable interface type is represented by a multiset of tokens, the basis set of which is declared by the negotiable interface type. Each negotiable interface also specifies an initial state, which will represent the abstract state of a newly created negotiable object. Modeling the abstract state of negotiable interface protocols using a factorable representation allows state to be split apart and transformed independently. The methods of a negotiable interface specify how the abstract state is transformed, using multiset transformations called behaviours. A behaviour is a pair, written
BREQ->BRET , where the required state, BREQ, and the returned state, BRET , are both multisets of tokens. A behaviour can be applied to an interface state, S (also a multiset of tokens), if the required state BRET is a submultiset of S. Applying a beha- viour to S transforms S by subtracting the BREQ from S then adding BRET to the result, using multiset addition and subtraction. There are three kinds of behaviour a method can specify: • Method behaviours specify how a method call transforms the state of an object as the result of call and execution.
• Re-entrant behaviours specify intermediate state transformations during a me- thod invocation the occur as the result of re-entrant or recursive calls.
• Cooperative behaviours specify out-calls that must be made to cooperating objects during the course of execution.
69 4 Negotiable Interfaces
Finally, negotiable interface inheritance uses a simple interpretation of behavioural sub- typing that permits the protocols of negotiable interfaces to be aggregated and extended. Negotiable interfaces may add additional tokens and behaviours to an inherited proto- col, but may not remove tokens or change behaviours of existing methods. Interfaces that inherit from multiple interfaces can combine the protocols of inherited interfaces by performing token aliasing. Token aliasing equates tokens from inherited interfaces, thereby equating states of the inherited protocols allowing them to interact. Section 4.1.1.1 describes the syntax for declaring simple negotiable interface types and describes the diagrammatic representation of negotiable interface protocols, while section 4.1.1.2 describes negotiable interface inheritance.
4.1.1.1 Simple Negotiable Interface Types
Figure 4.1 gives Mentok’s syntax for declaring negotiable interface types without subty- ping. The full syntax with subtyping declarations is given in the following section, which treats negotiable interface inheritance. A negotiable interface type declaration (in the T ypeDef production) consists of a name, a token set declaration, an initial state (from the InterfaceBody production), and a set of method declarations (the MethodDec production).1 Unlike C# or Java, methods of an interface are not declared in a specific lexical scope. Instead, each method has a receiver parameter which specifies the type that owns the method, as in Oberon and Component Pascal. Negotiable interface method declarations allow optional specification of behaviours on in three distinct places:
1. The optional behaviour specified at the end of the declaration is the method behaviour, which specifies how the abstract state of the interface is transformed as a result of executing the method.
2. The optional behaviour on the receiver parameter (via the P arameter production) declares the re-entrant behaviour of the method. The re-entrant behaviour of a method specifies a pattern of re-entrant or recursive calls that will be made as a result of calling a method.
3. The optional behaviour on each formal parameter of the method (via the P arameter production) declare cooperative behaviours. Cooperative behaviours can only be specified on parameters with negotiable interface types, and are behaviours that apply to that parameter. During the course of executing the method, the method
1 The T oken production, which states that tokens are identifiers, is included to disambiguate identifiers for tokens, and identifiers for type names, parameter names, etc.
70 4.1 Negotiable Interfaces in Mentok
Token ::= Ident S t a t e ::= "["[ Token { "," Token } ]"]" TypeIdent ::= Ident [ TypeIdent "." TypeIdent ] Behaviour ::= State "->" S t a t e InterfaceBody ::= "INTERFACE""RECORD" "STATE"[ Token { "," Token } ]";" "INIT" S t a t e ";" "END" Type ::= InterfaceBody | RecordBody | ... TypeDef ::= Ident =Type Parameter ::= Ident ":" TypeIdent ["::" Behaviour ] Parameters ::= [ Parameter { ";" Parameter } ] MethodDec ::= "PROCEDURE""(" Parameter ")" I d e n t "(" Parameters ")" [":" TypeIdent ][ :: Behaviour ]";"
Figure 4.1: Syntax for simple negotiable interface declarations (no subtyping).
must satisfy the cooperative behaviour by calling a pattern of methods on that parameter.
Listing 4.1 gives two examples of negotiable interface type declarations:
• The STATE declaration specifies a set of tokens for an interface. On line3 of listing 4.1, the interface IFoo declares four tokens, a, b, i1 and i2, which represent the capabilities or state for the IFoo’s protocol.
• The INIT declaration specifies the initial state for an interface. A state is a multi- set of the tokens declared by the interface. Line4 of listing 4.1 shows the initial state, [a], for the interface IBar. A newly instantiated object that implements the IBar interface will be in the state [X,X].
• The method behaviour on line 13 specifies that the method IFoo.Foo1 can only be executed when the interface has a token a, which will be replaced by the token b when the method has finished executing.
• Method IFoo.Foo2 on line 15 has the re-entrant behaviour [i1]->[i2]. During execution of IFoo.Foo2, an object will be re-entered by or will recursively call a method or sequence of methods which transform its intermediate state from i1 to i2. For interface IFoo, this re-entrant method can be only be satisfied by a single call to IFoo.Foo2. This intermediate state transformation occurs as a result of the outer transformation [b]->[a] specified as the method behaviour of IFoo.Foo2.
71 4 Negotiable Interfaces
1 TYPE 2 IFoo =POINTERTOINTERFACERECORD 3 STATE a,b,i1,i2; 4 INIT [a]; 5 END; 6 7 IBar =POINTERTOINTERFACERECORD 8 STATEX,Y; 9 INIT[X,X]; 10 END; 11 12 13 PROCEDURE (this:IFoo) Foo1() :: [a]->[b]; 14 15 PROCEDURE (this:IFoo::[i1]->[i2]) Foo2() :: [b]->[a]; 16 17 PROCEDURE (this:IFoo) Foo3() :: [i1]->[i2]; 18 19 PROCEDURE (this:IBar) Bar1() :: [X,X]->[Y]; 20 21 PROCEDURE (this:IBar) Bar2() :: [Y]->[X,X]; 22 23 PROCEDURE (this:IBar) Bar3(f:IFoo::[a]->[b]) :: [X]->[]; Listing 4.1: A pair of basic negotiable interface declarations.
Re-entrant behaviours are optional, since not all methods are self-interfering; howe- ver, it is a requirement that re-entrant behaviours can only be specified for stateful methods (methods with behaviours), since method that undergoes an intermediate, observable state transformation is stateful by definition.
• Method IBar.Bar3 (on line 23) has the cooperative behaviour [a]->[b] on a parameter named f, which is of type IFoo. IBar.Bar3 can only be called if it is passed an object which holds a token, a, which as a result of execution will be transformed to b by a sequence of out-calls. This cooperating state transformation occurs as a result of the outer transformation [X]->[] specified as the method behaviour of IBar.Bar3.
Cooperative behaviours can only be declared for parameters of negotiable interface type, and are optional.2
The protocol of a negotiable interface type is represented diagrammatically as follows:
2 Cooperative behaviours can be declared for methods without behaviours as well as static and nested proce- dures, in which case they are called stateful parameters since they do not necessarily involve cooperating state transformations.
72 4.1 Negotiable Interfaces in Mentok
1. The initial state of the interface is represented as a double circle labeled with the initial state.
2. If a method can be applied to a given state in the protocol, an arrow labeled with the method name is drawn from the given state to a circle labeled with the state obtained by applying the method’s behaviour. If the new state is not the initial state it is drawn as a single labeled circle. • Re-entrant state transformations, as in are split into four parts: – the invocation arrow, represented as a dashed line with the arrow head pointing from the beginning state; – the intermediate states, represented as labeled boxes; – the intermediate state transformations which may be method, re-entrant or cooperative transformations; – and the return arrow, represented as a dashed line with the arrow head pointing to the finishing state. • Cooperative state transformations are shadowed in as dotted transformations “behind” the method transformation they occur within. Cooperative state transformations may be elaborated, or unelaborated. – Elaborated state transformations illustrate each method call and state for the cooperative state transformation. Each arrow in an elaborated state transformation is labeled with a parameter name, interface name and method name. – Unelaborated state transformations show only a single arrow from the in- state to the out-state of the cooperative behaviour The arrow is labeled with the parameter name and type. • Arrows that point into empty space indicate that only part of the protocol has been displayed. This could be due to the protocol having a infinite number of legal states.
The interfaces IFoo and IBar from listing 4.1 are represented diagrammatically in figures 4.2 and 4.3 respectively. Interface IFoo has a method, IFoo.Foo2, that has a re-entrant behaviour that must be satisfied by calling IFoo.Foo3. Method IBar.Bar3 of IBar has a cooperative behaviour that can be satisfied by calling method IFoo.Foo.
73 4 Negotiable Interfaces
Figure 4.2: IFoo protocol with reentrant behaviour.
Figure 4.3: IBar protocol with cooperative behaviour.
4.1.1.2 Negotiable Interface Inheritance
A negotiable interface can extend or inherit from zero or more negotiable interfaces, providing that no cycles are introduced into the interface type hierarchy. A negotiable interface inherits the methods, and token sets from its parents but may not change
74 4.1 Negotiable Interfaces in Mentok
Token ::= Ident TokenDec ::= Token [ ":=" "{" TypeIdent "." Token { "," TypeIdent "." Token } "}"] TypeIdent ::= Ident [ TypeIdent "." TypeIdent ] | ... S t a t e ::= "["[ Token { "," Token } ]"]" Behaviour ::= State "->" S t a t e InterfaceBody ::= "INTERFACE""RECORD"["(" { "+" TypeIdent } ")"] "STATE"[ TokenDec { "," TokenDec } ]";" "INIT" S t a t e ";" "END" Type ::= InterfaceBody | RecordBody | ... TypeDef ::= Ident =Type
Parameter ::= Ident ":" TypeIdent ["::" Behaviour ] Parameters ::= [ Parameter { ";" Parameter } ] MethodDec ::= "PROCEDURE""(" Parameter ")" I d e n t "(" Parameters ")" [":" TypeIdent ][ :: Behaviour ]";"
Figure 4.4: Syntax for declaration a negotiable interface type (with subtyping). the existing signatures, or behaviours of inherited methods, nor remove tokens. This restriction ensures that negotiable interfaces are substitutable for their parent types with respect to Mentok’s type and state system. Negotiable interfaces may, however, declare new methods and new tokens to effectively extend inherited protocols. Parent protocols can also be combined by a sub-interface by declaring an alias token, which creates an equivalence class for inherited tokens. Figure 4.4 gives the complete syntax for declaring negotiable interfaces, including syntax for negotiable interface inheritance. There are three things of note:
• As in GPCP, a parenthesized list of interface type names, each prefixed by the character ’+’, declares the list of inherited interfaces (in the InterfaceBody produc- tion).3
• Interfaces can declare new methods with new behaviours. The syntax for method declaration is unchanged, but the new methods can have behaviours that operate over the entire inherited token set, effectively extending inherited protocols with extra arrows.
• Token declarations (also in the InterfaceBody production) take two forms. A token identifier by itself is a new token declaration, which adds a new token to the
3 Records (classes) are similarly declared to implement a list of interfaces, but optionally may include a type name at the front of the list that specifies the base class.
75 4 Negotiable Interfaces
inherited token set. The second form of token declaration is the alias tokens which is declared as follows: alias := { IParent 1.t 1 , IParent 2.t 2, ...} Alias tokens are an equivalence class across one or more inherited tokens, and a name for the equivalence class for the negotiable interface protocol. Tokens in the alias equivalence class are qualified by their parent interface name in order to prevent name clashes. In the code snippet above, the alias token is named “alias”, and it aliases, or names an equivalence class for token “t 1” from interface “IParent 1”, “t 2” from interface “IParent 2”, and so on. Aliases for an equivalence class of size one can be used to rename tokens within the namespace of the new interface. Aliases for more than one inherited token can be used to combine inherited protocols, as potential states of protocols are made equivalent. Aliases only know in the namespace of the child interface; the renaming does not affect parent interfaces, or client code of parent interfaces. The following checks are performed for token aliases: – In the namespace of the new interface, each inherited token must be known by one unique name. If the parents of an interface have a common ancestor, and tokens from the ancestor have been aliased independently by the parents, the new interface must declare new aliases to ensure that those tokens have a single name, and a single equivalence class. – Tokens from the same interface may not be placed in the same equivalence class by an alias. If permitted, this could have an unintended effect on local negotiations (discussed in the next chapter).
Listing 4.2 and figure 4.5 give the source listing and protocol diagrams for three simple negotiable interfaces, ISizeable, IHost, and IItem, representing roles for sizeable, host, and hostable widgets respectively.4 Listing 4.3 give the source listing for an interface, IHostableContainer, that inherits from ISizeable, IHost, and IItem. IHostableContainer is a sizeable host that is a hostable item itself, but is only sizeable when its set of hosted children is locked, and when it has a parent host to give it a size. This restriction is specified via the token alias on line5, which declares an equivalence class named closed between the inherited closed, hosted, and sizeable. Finally, IHostableContainer declares a new method for laying out its children, called Layout, that can only be called when it is hosted, and locked. Figure 4.6 gives the protocol for IHostableContainer, which combines
4 Some details of the method signatures for these interfaces, including cooperative and re-entrant beha- viours, have been omitted for brevity.
76 4.1 Negotiable Interfaces in Mentok
TYPE ISizeable =POINTERTOINTERFACERECORD STATE sizable; INIT [sizable]; END;
IHost =POINTERTOINTERFACERECORD STATE closed,open,item; INIT [closed]; END;
IItem =POINTERTOINTERFACERECORD STATE hosted,unbound; INIT [unbound]; END;
PROCEDURE (this:ISizeable) Resize :: [sizable] ->[sizable];
PROCEDURE (this:IHost) Unlock :: [closed] ->[open];
PROCEDURE (this:IHost) Lock :: [open] ->[closed];
PROCEDURE (this:IHost) HostItem :: [open] ->[open,item];
PROCEDURE (this:IHost) FreeItem :: [open,item] ->[open];
PROCEDURE (this:IItem) Host :: [unbound] ->[hosted];
PROCEDURE (this:IItem) Free :: [hosted] ->[unbound]; Listing 4.2: Parent interfaces: IHost, IItem, ISizeable
Figure 4.5: Protocols for ISizeable, IHost, and IItem (clockwise from upper left). 77 4 Negotiable Interfaces
1 TYPE 2 IHostableContainer = 3 POINTERTOINTERFACERECORD (+ ISizeable + IItem + IHost) 4 STATE closed := 5 {IHost.closed + IItem.hosted + ISize.size}; 6 INIT [layout]; 7 END; 8 9 PROCEDURE (this: IHostableContainer) Layout 10 :: [closed]->[closed]; Listing 4.3: Child interface: IHostableContainer
Figure 4.6: Protocol for IHostableContainer.
the protocols of its parent interfaces through the closed alias token, and adds a new arrow for the Layout method.
4.1.2 Objects and Negotiation
While the protocols of negotiable interface types allow specification of some temporal behaviours of component roles, they are of little use if they are not formally checked and enforced at the programming language level. Pure static checking of Mentok’s negotiable interface protocols would require in the least global alias analysis (impossible for incomplete component applications), or an alias protection scheme such as that in [CPN98]. Instead, protocol checking in Mentok is partially dynamic. Each object that provides negotiable interfaces, called negotiable objects, holds a mul- tiset of tokens, corresponding to a negotiable interface protocol state. Clients wishing
78 4.1 Negotiable Interfaces in Mentok to use the services of a negotiable object must negotiate for state, using a Mentok’s USE command. A successful negotiation results in the client obtaining a stateful reference to the object, which guarantees the ability to call the requested services.
4.1.2.1 Classes and Negotiable Objects
Token ::= Ident S t a t e ::= "["[ Token { "," Token } ]"]" Behaviour ::= State "->" S t a t e TypeIdent ::= Ident [ TypeIdent "." TypeIdent ] Parameter ::= Ident ":" Type ["::" Behaviour ] Parameters ::= [ Parameter { ";" Parameter } ] RecordBody ::= "RECORD"["(" { "+" TypeIdent } ")"] F i e l d s "END" MethodImp ::= "PROCEDURE" "(" I d e n t ":" TypeIdent "." TypeIdent [ "::" Behaviour ]")" I d e n t "(" Parameters ")"[":" TypeIdent ] [ :: Behaviour ]";" Statements "END" I d e n t
Figure 4.7: Syntax for implementing methods.
Partial syntax for record declarations and method implementation in Mentok is given in figure 4.7. Record declarations in Mentok are similar to the in GPCP (section 3.1.3), in that each record or class can implement zero or more interfaces, and must provide a signature matching method implementation for each method declared by its interfaces. The syntax for method implementations provides for an optional re-entrant behaviour, optional cooperative behaviours on parameters, and an optional method behaviour. Instances of records that implement negotiable interfaces are called negotiable objects. For each negotiable interface it provides, a negotiable object holds a multiset of the interface’s tokens, representing a state in the interfaces protocol graph. When the object is first created, (i.e. by a NEW statement), the object is assigned the initial state declared by each interface it implements. This state is then transformed by the object’s negotiable interface methods; as a method is called, the “in” multiset of the method behaviour is consumed, and when the method returns, the “out” multiset is returned. Before a method can be called, however, clients must negotiate with negotiable objects to ensure that any method calls can legally be performed.
79 4 Negotiable Interfaces
Token ::= Ident S t a t e ::= "["[ Token { "," Token } ]"]" Behaviour ::= State "->" S t a t e Use ::= "USE" I d e n t "::" Behaviour "DO" Stmts { "|" I d e n t "::" Behaviour "DO" } ["ELSE" Stmts ] "END" Stmts ::=Use | I f | For | ...
Figure 4.8: Syntax for USE.
4.1.2.2 Negotiation
Negotiation is the term given to type-testing and guarding of negotiable interface proto- cols. Negotiation is a dynamic check, that consists of a request and a promise from a client to an object, in the form of a negotiation behaviour; the in-state of the behaviour is the request for state, and the out-state is the promised transformed state. A successful negotiation gives a client a stateful reference to a negotiable object that initially holds the requested in-state multiset of the negotiation behaviour. The client may then invoke a sequence of methods via the stateful reference, transforming the multiset of tokens as- sociated with the stateful reference until the out-state is reached, at which point the negotiation completes and the state is returned to the object. Negotiation in Mentok is provided by the USE statement, the syntax of which is given in figure 4.8. The USE statement is similar to Component Pascal’s WITH statement, in that it has multiple branches and performs a type test and guard on a variable of the statement. In the case of the USE statement, the variable, called the subject of the negotiation and the test and guard is on the protocol state of the object to which the subject refers. As with WITH, branches are tested sequentially, and when a branch test succeeds, that branch is executed with the knowledge that the subject meets the requirements of the test; no further branches are tested or executed. In the case of USE statements, a successful test is known as a successful negotiation, and the subject of the negotiation becomes a stateful reference for the body of the successful branch. A stateful reference is a reference to a negotiable object that has been granted exclusive use of some part of that object’s state. A successful negotiation not only tests negotiable object state, but locks it for use by the negotiation client. There USE statement has different semantics depending on whether the negotiation subject is a stateful reference or not. If the subject is already stateful (perhaps because of an outer USE statement), the USE command has local negotiation semantics. If the subject is not stateful, the USE command has global negotiation semantics.
80 4.1 Negotiable Interfaces in Mentok
USE
1.
i :: ??
2.
E obj. (i* = obj) &