Dynamic Dispatch in Object-Oriented Languages
Total Page:16
File Type:pdf, Size:1020Kb
Dynamic Dispatch in Ob ject-Oriented Languages y Scott Milton and Heinz W. Schmidt TR-CS-94-02 [email protected], [email protected] January 1994 Abstract Dynamic binding in ob ject-oriented languages is p erhaps the most imp ortant semantic asp ect of these languages. At the same time it can contribute to ineciency and lackof robustness b ecause it incurs lo okup overheads on function calls and hinders the compiler determining the exact typ e of ob jects held in variables or returned by functions. This may, for instance, preclude inlining of small functions or attribute o set computation at compile time. Yet attribute accesses are the most frequently executed op erations. As a result, to regain lost p erformance, OO programmers are tempted to break the encapsulation of classes or want explicit control over dynamic dispatch, trading o extensibility. In the implementation of parallel ob ject-oriented languages the additional complication arises that ob ject accesses may require more exp ensive remote memory accesses. Lo okup at the call may b e inappropriate if the co de has to b e executed on a di erent pro cessor and there p erhaps has a di erent address. This pap er summarizes dispatching as addressed in several mo dern ob ject-oriented languages. We then describ e and b enchmark fast and exible dispatchschemes that we are currently implementing on SPARC based workstations and multi-pro cessors. These involve elements of C++ virtual function tables and Ei el's and Sather's ability to rede ne abstract functions as attributes. Initial b enchmarks seem to promise improved eciency on a range of mo dern RISC based architectures. Categories and Sub ject Descriptors: D.1.1 [Programming Techniques]: ob ject-oriented programming; D.3.2 [Programming languages]: Language Classi cation { object-oriented languages,C++, Ei el, Sather; D.3.3 [Programming languages]: Language Constructs { dynamic binding; D.3.4 [Programming languages]: Pro cessors { compilers, optimization; E.1 [Data]: Data Structures { tables ; General Terms: Algorithms, Design, Languages. Additional Keywords and Phrases: b enchmarks, dynamic dispatch, eciency, ob ject-oriented runtime system, multiple inheritance. Department of Computer Science, The Australian National University, Canb erra y CSIRO { Division of Information Technology, Canb erra 1 1 Intro duction Dynamic binding of function names to function de nitions or co de entry p oints is one of the central features of ob ject-oriented languages. It contributes to exibility and extensibility b ecause it allows new co de to b e called from unaltered old co de by passing instances of new typ es to old functions or assigning them to old variables. Conceptually, the ob jects themselves resp ond to calls. Function binding is at their discretion in ob ject-based languages or at the discretion of their classes in class-based languages. We call functions and attributes collectively features.Features are abstract if they are p olymorphic, i.e., rede nable for di erent typ es of ob jects. In some languages all features are abstract, in others one has to distinguish them upfront, i.e., b efore any rede nition is p ermitted or dynamic dispatch is supp orted. For instance, in C++ one uses the virtual keyword [27] to this end, or in CLOS, one declares functions as generic [1, 25,10]. The mechanism implementing dynamic binding is called dynamic dispatch. Typically it uses runtime typ e information to lo okup, or bind to, the prop er function. Systems like the Lisp-machine op erating system with ab out 2000 classes or Smalltalk systems with many hundreds of classes rely on this mechanism to make prototyping and incremental compilation fast, and systems in op eration robust in the presence of adho c extensions. There is a tension b etween dynamic dispatch and static typing. The exibility resulting from dynamic dispatch go es directly against the safety and increased compile time knowledge of data typ es needed for early error detection and for the b est optimisation. There are inherent tradeo s in howmuchchecking is p ossible at compile time versus howmuchmust b e done at runtime to guarantee that, for instance, optimiser assumptions ab out typing are preserved during execution. The dynamic lo okup is a ma jor obstacle to optimisation from two di erent viewp oints: rst, the lo okup time has to b e paid in addition to the function call; secondly, since di erent functions maybeinvoked by the same call in the course of execution, small functions can no longer b e inlined, and attribute o sets cannot b e computed statically,in general. Yet attribute accesses are the most frequently executed op erations. The inlinin g problem can also cause intra-pro cedural optimisations to b e deferred that would only b e p ossible after inlinin g in the context of an expanded function b o dy. This is particularly relevant with lo ops in OO programs when a generic iteration metho d is encap- sulated with a class and a caller drives these iterations in a lo op outside the corresp onding class. For eciency reasons, OO programmers are then tempted to break the encapsulation and/or demand more explicit control over dispatch while giving away extensibility. Eciency is critical for the wide use of any language. We are particularly interested in ecient dispatch mechanisms for our exp erimental high-p erformance Sather [20,21]. The implementation of ecient dynamic dispatch has b een the sub ject of several pap ers. For instance Rose [19] compares various low-level interrupt-based dispatching on sto ck hard- ware that makes single dispatch as fast as function calls at the cost of p ortability. Some implementations such as CLOS [11] and Sather [17, 18, 22] use hashing while others such as C++ and Ei el [15] use lo okup tables. CLOS calls a generic function which encapsulates the lo okup, while most other languages generate sp ecial call instructions. Sather and C++ [27] currently employvery di erent dynamic dispatching mechanisms. C++ ob jects have em- b edded p ointers to function lo okup tables to allow shared co de inherited functions to nd p ossibly rede ned functions in sub classes. Sather uses ob ject typ e tags and function keys to hash for function p ointers. 2 On top of the dispatching mechanisms various optimisations are used. A foremost goal is to avoid dynamic dispatch wherever p ossible. Languages likeC++ and Sather restrict proto- typing exibility to help the programmer and compiler control the asso ciated cost of dynamic dispatch. In C++ one distinguishes abstract functions with several di erent implementations by the virtual keyword when de ning them the rst time. Sather distinguishes dispatched calls according to the typ e of ob jects: the declaration x: A, allows x to hold only ob jects of exactly typ e A, while x: $A can hold ob jects of any subtyp e. In the former case calls to the ob ject can b e resolved at compile time. In addition, some languages such as CLOS, Self [4] and Sather try to o set the cost of dispatching by other optimisation, to o, including so-called customisation and function pointer caching. Function p ointer caches allowcheap lo okups if the function was recently lo oked-up for the same typ e of ob ject. Customisation duplicates and customises inherited co de for each sub class. With cus- tomisation, calls to self can b e resolved statically. Ful l customisation, which customises all functions in a given class, has another advantage: exclusively customised co de is going to ac- cess instances of this class. The class can therefore cho ose the layout of instances to improve compaction and alignment which can make an order of magnitude sp eed di erence on RISC architectures such as the DIGITAL Alpha. Furthermore, all attribute accesses to self can b e hardwired in the co de. The downside of full customisation is of course increased co de size and longer recompilation times, as changes in inherited functions may a ect several sub classes. In this pap er we brie y review di erent OO dispatching schemes, particularly fo cusing on C++ and Sather eciency needs, and complexity resulting from multiple inheritance. We rep ort initial b enchmarking results, and prop ose a new ecienthybrid scheme extending C++ function tables. We discuss this in the context of the Sather language but elements of the scheme can b e used with C++, Ei el or other MI languages. 2 Dynamic Dispatch: Concepts and Implementations In general terms, the dispatch problem can b e characterized as follows: Given a class name c andafeature name f , determine the actual featurelocation function pointer, attribute o set p = Locc; f : A naive implementation enco des Loc by a table of the size C F leading to a fast constant time lo okup, where C is the numb er of class names and F the numb er of feature names. Each column in this table can b e viewed as representing an abstract feature f that can lo okup a function p ointer or attribute o set given a class index. Dually, eachrow represents a function table for the corresp onding class c. Unfortunately, for large OO co des these tables are extremely sparse. The space requirements are unacceptable and the tables are not extensible. When new classes or features are added, a large table has to b e reorganised, which is unacceptable with incremental compilation. Various dispatchschemes therefore try to take advantage of the most common incremental compilation, viz., adding and compiling individual classes without requiring recomputation of tables for already compiled classes.