Lazy Interworking of Compiled and Interpreted Code for Sandboxing and Distributed Systems

Lazy Interworking of Compiled and Interpreted Code for Sandboxing and Distributed Systems Camil Staps August 16, 2019 Supervisor: prof. dr. dr.h.c. ir. M.J. Plasmeijer Second reader: drs. J.H.G. van Groningen Contents Preface 5 1 Paper to be presented at IFL '19 7 2 File types and bytecode generation workflow 19 2.1 Bytecode generation . 19 2.2 Options in the Clean IDE and cpm ........................ 20 2.3 File formats . 21 2.3.1 Bytecode (.bc)............................... 21 2.3.2 Prelinked bytecode (.pbc)......................... 22 3 Building the source 25 3.1 Source code overview . 25 3.2 Build workflow . 25 3.2.1 Bytecode generation toolchain . 25 3.2.2 Library object files . 26 3.2.3 Standalone interpreter . 26 3.2.4 Graphical debugger . 26 3.2.5 WebAssembly interpreter . 26 3.3 32-bit builds . 27 4 Future work 29 4.1 Safe interpretation in Soccer-Fun . 29 4.2 A JavaScript debugger . 29 4.3 The Clean compiler in the browser . 30 4.4 End-to-end tests for iTasks with Selenium . 30 3 Preface This thesis is the result of approximately one and a half year of work on the development of an interpreter for the lazy functional programming language Clean and its application in various contexts. As will be explained below, interpretation is done on the level of the intermediate language for the abstract ABC machine. When I became involved in the project, John van Groningen had already written a 32- bit ABC interpreter targeting C and asm.js with near-complete support for the entire ABC instruction set. With Erin van der Veen, I developed a 64-bit version of the C interpreter and started to put it to use to communicate lazy values between different binaries and platforms. Because the evaluation of a lazy value may depend on program code which may not be present in the receiving executable, the interpreter is here used to make the communication platform-independent (although other options, like just-in-time compilation combined with dynamic linking, would have been suitable as well). Interworking of compiled and interpreted code was in a very early stage when the project with Erin van der Veen finished and was finalized only at the end of my research internship. As my master's thesis project, I then ported the interpreter to WebAssembly and applied it in the iTasks framework for developing workflow web applications. This framework needs a way to run Clean in web browsers and interact with the Document Object Model (DOM) of the web page as well as with third-party JavaScript libraries. Previously, this framework used just-in-time compilation to JavaScript, but our new setup with an interpreter in WebAssembly yields a more predictable and overall better performance. Summarizing, this thesis describes three main deliverables: • A standalone (C) interpreter, with a suitable bytecode format and the accompanying toolchain (bytecode generator, linker, and stripper). • A Clean \serialization" library capable of (de)serializing lazy values in a platform-independent and executable-agnostic manner. • A new backend for the browser interface of iTasks, together with a JavaScript foreign function interface. The noteworthy aspects of each of these have been described in detail in an article ac- cepted for presentation at the 2019 edition of Implementation and Application of Functional Languages (IFL). A preliminary version of this article constitutes the first and main chapter of this thesis. References are included in the paper and will be omitted in the rest of the thesis. The remainder of this thesis is more useful as a guide to using and further developing the software described here: chapter 2 describes the bytecode generation workflow, IDE options, and file types; chapter 3 how the tools in the project can be built. In chapter 4 I give a number of ideas for future projects. I am deeply grateful for John's gladness to explain the finest details of the implementation of Clean and for Rinus' continuous watch on possibly related projects and application areas, for their overall supervision and the great amount of freedom they let me have. 5 6 Chapter 1. Paper to be presented at IFL '19 7 Lazy Interworking of Compiled and Interpreted Code for Sandboxing and Distributed Systems Camil Staps John van Groningen Rinus Plasmeijer [email protected] [email protected] [email protected] Radboud University Nijmegen Radboud University Nijmegen Radboud University Nijmegen Nijmegen, The Netherlands Nijmegen, The Netherlands Nijmegen, The Netherlands ABSTRACT KEYWORDS More and more applications rely on the safe execution of untrusted interpreters, functional programming, laziness, sandboxing, Web- code, for example in the implementation of web browsers and Assembly plugin systems. Furthermore, these applications usually require some form of communication between the untrusted code and its ACM Reference Format: embedder, and hence a communication channel must be set up Camil Staps, John van Groningen, and Rinus Plasmeijer. 2020. Lazy Inter- in which values are serialized and deserialized. This paper shows working of Compiled and Interpreted Code for Sandboxing and Distributed that in a functional programming language we can solve these Systems. In Proceedings of International Symposium on Implementation and two problems at once, if we realize that the execution of untrusted Application of Functional Languages (IFL’19). ACM, New York, NY, USA, code is nothing more than the deserialization of a value which 12 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn happens to be a function. To demonstrate this, we describe the implementation of a serialization library for the language Clean, which internally uses an interpreter to evaluate untrusted code in a separate, sandboxed environment. Remarkable is that despite 1 INTRODUCTION the conceptual asymmetry between “host” and “interpreter”, lazy The execution of untrusted code, or code that is unknown to the interworking must be implemented in a highly symmetric fashion, executable when it is started, has become paramount in a large much akin to distributed systems. The library interworks on a low number of different contexts. The most common example isthe level with the native Clean program, but has been implemented web browser, which has to be able to run JavaScript code (or more without any changes to the native runtime system. It can therefore recently WebAssembly [14]) which can interact with the web page easily be ported to other programming languages. but should under no circumstance crash the browser or the render- We can use the same technique in the context of the web, where ing engine. we want to be able to share possibly lazy values between a server Another use case can be found in plugin systems. We may for and a client. In this case the interpreter runs in WebAssembly in example imagine a compiler which can load plugins for various the browser and communicates seamlessly with the server, written language extensions, where plugins can provide functions for trans- in Clean. We use this in the iTasks web framework to handle com- forming the abstract syntax tree. In this case it is less important munication and offload computations to the client to reduce stress that plugins cannot crash the host program, but the plugin has to on the server-side. Previously, this framework cross-compiled the interwork on a much more fine-grained level with the host program, Clean source code to JavaScript and used JSON for communication. interacting closely with the same data types. The interpreter has a more predictable and better performance, and A similar need is felt in workflow systems like the iTasks frame- integration is much simpler because it interworks on a much lower work for Task-Oriented Programming [24, 25]. Applications written level with the web server. in this framework center around various tasks and the workflows that users walk through to execute them. Currently, applications CCS CONCEPTS must be recompiled to add new possible workflows. However, if • Software and its engineering → Functional languages; • In- we could dynamically add code to a running executable, the iTasks formation systems → Browsers; • Computer systems organi- server could remain online. zation → Client-server architectures. In many of these contexts, we furthermore want the “untrusted code” or the “plugin” to be able to be distributed in a platform- independent manner: there are no different JavaScript versions for different binary platforms, nor would one expect to have to Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed download a different compiler plugin on Windows than on Linux. for profit or commercial advantage and that copies bear this notice and the full citation This complicates the matter, because it means we cannot simply on the first page. Copyrights for components of this work owned by others than ACM distribute machine code which can be linked dynamically. must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a Lastly, in a lazy functional programming language, we expect fee. Request permissions from [email protected]. that the interface between the host program and the added code is IFL’19, September 2020, Singapore lazy as well. For instance, it should in principle be possible to have © 2020 Association for Computing Machinery. ACM ISBN 978-x-xxxx-xxxx-x/YY/MM...$15.00 a plugin in some system which computes an infinite list of integers, https://doi.org/10.1145/nnnnnnn.nnnnnnn as long as the host only requests a finite amount of them. 8 Chapter 1. Paper to be presented at IFL '19 IFL’19, September 2020, Singapore Camil Staps, John van Groningen, and Rinus Plasmeijer 1.1 The solution in a nutshell a web application are usually written in different languages.

Lazy Interworking of Compiled and Interpreted Code for Sandboxing and Distributed Systems

CSCI 2041: Lazy Evaluation

Reversible Programming with Negative and Fractional Types

Let's Get Functional

Notes on Functional Programming with Haskell

Design Patterns in Ocaml

Scala by Example (2009)

Control Flow

210 CHAPTER 7. NAMES and BINDING Chapter 8

61A Lecture 6 Lambda Expressions VS Currying

Exam 1 Review: Solutions Capture the Dragons

CSE 341 : Programming Languages

Top Functional Programming Languages Based on Sentiment Analysis 2021 11