High-Performance Language Interoperability for Scientific
Total Page:16
File Type:pdf, Size:1020Kb
High-Performance Language Interoperability for Scientific Computing through Babel Thomas G. W. Epperly Gary Kumfert Tamara Dahlgren Dietmar Ebner Jim Leek Adrian Prantl Scott Kohn Center for Applied Scientific Computing Lawrence Livermore National Laboratory Livermore, California 94551 Abstract—High-performance scientific applications are challenges that must be overcome to successfully usually built from software modules written in multiple create large-scale applications. programming languages. This raises the issue of lan- Interoperability between languages involving in- guage interoperability which involves making calls be- compatible programming paradigms and type sys- tween languages, converting basic types, and bridging disparate programming models. Babel provides a feature- tems is inherently difficult. For example, dynamic rich, extensible, high-performance solution to the language memory management may be a feature of one interoperability problem currently supporting C, C++, language but left to the programmer in another. FORTRAN 77, Fortran 90/95, Fortran 2003/2008, Python, Errors may be reported explicitly versus via dynamic and Java. Babel supports object-oriented programming exceptions. Arrays may be represented in column- features and interface semantics with runtime-enforcement. versus row-major order and their indices start at 0 In addition to in-process language interoperability, Babel includes remote method invocation to support hybrid versus 1. These incompatibilities can make building, parallel and distributed computing paradigms. debugging, and maintaining associated software systems extremely challenging. I. INTRODUCTION Additional features not natively available across Babel is a programming language interoperability languages are typically required by the numeri- toolkit for high-performance scientific computing. cal libraries that dominate scientific applications. It was designed to address specific functional and Specifically, dynamic, multi-dimensional arrays, ar- performance needs in the development of large-scale, ray strides, single- and double- precision complex multi-physics simulations involving the integration of numbers, and structures are common. The heavy multiple mathematical models, libraries, and solvers reliance on arrays for managing numerical data is a implemented in different programming languages. critical aspect of these applications and it can have The inherent complexity of the resulting systems re- a significant impact on performance. quires the aid of software tools for their development, Further adding to the challenges faced by the evolution, and maintenance. scientific computing community is the need for Multi-disciplinary, multi-physics, and multi- software tools to run on commercial as well as one- resolution applications are far too complex to be of-a-kind platforms. Computational scientists often developed by a single organization. Hence, the parts develop their codes on desktop platforms then port — models, libraries, and solvers — are developed them to Top 5001 machines for high performance by code groups with relevant expertise. Each team runs. Machine and language idiosyncrasies present often relies on different programming languages unique portability challenges requiring a detailed and development platforms. Some critical codes understanding of binary interfaces and linkage con- may have even been developed by experts long retired. These differences exacerbate the integration 1http://www.top500.org ventions. Fortran Finally, the often lengthy execution times — on 90/95 Fortran the order of days to weeks — of scientific simulation 2003/2008 FORTRAN runs result in the need to minimize the introduction 77 of additional overhead. Babel was initially designed Fortran Java for fast, in-process communication to address this important issue. The project won the prestigious R&D 100 award [23] in 2006 for “the world’s most rapid communication among many programming languages in a single application.” While Babel’s primary focus is efficient interoperability within a XML BABEL C single address space, it also fully supports transparent remote method invocation (RMI). Large-scale, multi-physics, and multi-resolution computational science and engineering applications of today face significant integration challenges due Python C++ to their use of numerically intensive, long-running codes written in different (including legacy) pro- gramming languages for deployment on one-of-a- kind platforms. Babel addresses the functional and Fig. 1. Programming languages supported in the Babel 2.0 release. performance needs of the community through a high- performance interoperability toolkit. The motivation for and approach taken to develop the technology is is important due to the significant amount and described in Sections II and III, respectively. Details inherently long lifetimes of legacy codes written of the toolkit are provided in Sections IV through VI. in those languages. Applications of Babel are presented in Section VII. Babel, which bridges the gap among the different Section VIII covers the most relevant related work. programming paradigms and languages, enables Future work is presented in Section IX. software written in classical imperative programming languages, such as Fortran and C, to interoperate with II. MOTIVATION interpreted scripting languages, such as Python. To Interoperability solutions at the time Babel was ini- accomplish this feat, Babel must deal with different tially conceived, such as CORBA (Common Object binary representations, symbol length limitations, Request Broker Architecture) [35] and COM (Com- and inconsistent rules for identifier declarations (e. g., ponent Object Model) [38], tended to be geared more case sensitivity or reserved symbols). for general and commercial interests. That is, they The numerically intensive computations performed generally lacked support for the legacy programming in scientific applications are heavily dependent on the languages and native data types commonly used by use of array-based and numerical data types. Of par- computational scientists and engineers. Features for ticular importance are dynamic, multi-dimensional aiding cross-language debugging were also missing. arrays, array strides, single- and double-precision Traditional scientific programming languages gen- complex numbers, and structures. Arrays with all erally lack support for object oriented programming these features are not generally native types in (OOP), which has increasingly been adopted by modern, general-purpose languages. the community. OOP codifies the discipline of data Both scientific and general-purpose programming and procedure encapsulation, thereby facilitating the languages lack support for interface contracts, which development of re-usable software. The large set of are a well-known software engineering technique supported languages, shown in Figure 1, emphasizes for improving testing and debugging [32]. Interface languages of interest to the community. Support contracts define and enable the automated enforce- for traditional scientific programming languages ment of software behaviors at call boundaries. They Scientific Component documentation. The supporting library is referred to as the “SIDL Runtime Library”. Figure 2 shows the SIDL Babel major parts of the toolkit in relation to their use in Specification Glue Code a scientific software artifact. SIDL provides a declarative description of the public methods of the calling interface as extensions Babel of the scientific object model. The model is defined SIDL through base classes, interfaces, methods, excep- User-Defined Runtime tions, and built-in types. SIDL, like the CORBA Impl. Library Interface Definition Language (IDL) provided by the Object Management Group (OMG) [35], [10], is programming language-neutral. Both IDLs support the modular packaging of full method definitions Fig. 2. Babel translates SIDL specifications to glue code and language- specifying the type (e. g., integer, float) and mode specific prototypes. (i. e., in, out, inout) of each parameter. Both also sup- port enumerations, arrays, and multiple inheritance of interfaces. Unlike CORBA IDL, SIDL provides basic are also a logical extension of language interoper- types for numeric complex and multi-dimensional, ability solutions for the support of cross-language multi-strided arrays. Another distinguishing feature debugging. is complete support for polymorphism across pro- The scientific computing community needs a gramming language boundaries. For example, Python tailored, high-performance language interoperability may be used to overload a specific method of a toolkit to facilitate the development of complex, Fortran module, throwing an exception implemented large-scale, multi-physics, and multi-resolution appli- in C++. Interface contract clauses with a rich set of cations. Traditional scientific and modern program- expressions are also supported as an aid to testing ming languages must be supported to accommodate and debugging. legacy and new codes. Dynamic, multi-dimensional The Babel compiler translates SIDL descriptions arrays are critical to the numerically intensive codes into wrappers used to map between programming that must interoperate. Ensuring these multiple language-specific types and the common representa- programming language applications work correctly tion layer. Native language features, such as built-in requires mechanisms, like interface contracts, for data types and method overloading, are leveraged cross-language debugging.