Arxiv:2105.10819V1 [Cs.PL] 22 May 2021 Eaestuck

1 Choosing is Losing: How to combine the benefits of shallow and deep embeddings through reflection ARTJOMS ŠINKAROVS, Heriot-Watt University, UK JESPER COCKX, TU Delft, Netherlands Dependently-typed host languages empower users to verify a wide range of properties of embedded languages and programs written in them. Designers of such embedded languages are faced with a difficult choice between using a shallow or a deep embedding. The former is easier to use because the entire infrastructure of the host langauge is immediately available. Meanwhile, the latter gives full access to the structure of embedded programs, but is difficult to use in practice, especially when the embedded language is itself dependently typed. The main insight presented in this paper is that the choice between shallow and deep embedding can be eliminated by working in a host language with reflection capabilities: we start from a shallow embedding that can use all libraries and tools of the host language, and later use reflection to expose the deep structure of the embedded programs. Concretely, we apply this technique to embed three programming languages — Kaleidoscope, SaC, and (a subset of) APL — into the dependently typed theorem prover Agda, using dependent types to statically enforce several properties of interest. We then use Agda’s reflection capabilities to extract the embedded programs back into the original language, so that the existing toolchain can be leveraged. In this process, statically verified properties of the host language are mapped onto runtime checks in the target language, allowing extracted programs to interact safely with existing code. Finally, we demonstrate the feasibility of our approach with the implementation and extraction of a convolutional neural network in our embedding of APL. ACM Reference Format: Artjoms Šinkarovs and Jesper Cockx. 2018. Choosing is Losing: How to combine the benefits of shallow and deep embeddings through reflection. Proc. ACM Program. Lang. 1, CONF, Article 1 (January 2018), 28 pages. 1 INTRODUCTION Dependently-typed systems such as Coq [Bertot and Castran2010],Agda[Norell 2008],Lean[de Moura et al. 2015] or Idris [Brady 2013] make it possible to write down specifications of programs as types and check that those specification are satisfied during type checking. These verified programs can be ei- ther executed directly, or compiled into a general-purpose target language. However, if we want to arXiv:2105.10819v1 [cs.PL] 22 May 2021 generate code targeting a specific tool or language that is not supported by the existing backends, we are stuck. There are two common approaches to this problem: i) extending the toolchain to produce code in the desired target; or ii) defining a syntactic representation of the target language and implementing the program in it. The first approach gives the most freedom but it may be technically challenging, as one has to modify the internals of the theorem prover. The second approach forces us to define a deep embedding and express our specifications in it. While this seems easier up-front, the resulting encodings of the specifications get impractically large, especially when the embedded language is itself dependently typed. In this paper we consider an approach that combines the benefits of the two aforementioned solutions. We use a shallow embedding of the target language and rely on compile-time metaprogramming to define custom extractors, making it possible to write a verified program hand-in-hand Authors’ addresses: Artjoms Šinkarovs, Heriot-Watt University, Edinburgh, Scotland, EH14 4AS, UK, [email protected]. uk; Jesper Cockx, TU Delft, Delft, 2628 XE, Netherlands, [email protected]. 2018. 2475-1421/2018/1-ART1 $15.00 https://doi.org/ Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. 1:2 A. Šinkarovs and J. Cockx with an executor for it — all within a single environment. No need to touch internals of the host language; no encoding artefacts. Specifically, we use reflection [Anand et al. 2018; Christiansen and Brady 2016; Ebner et al. 2017; van der Walt and Swierstra 2013] (also known as direct reflection [Barzilay 2005]),aformofcompile- time metaprogramming.1 That is, the host language has quote/unquote primitives that translate expressions to their internal representation and back. The internal representation is regular data that can be accessed and manipulated as usual. The main goal of reflection is to automate routine tasks of writing boilerplate code, e.g. tactics. However, APIs are typically also complete enough for general code generation tasks such as writing custom compiler backends. In this paper, we are interested in writing verified embedded programs that can be extracted to an existing language. We approach this problem by using an embed-typecheck-extract loop. That is, we first embed a language or a subset of it into a theorem prover, using dependent types to encode properties of interest; next, we use the typechecker of the host language to verify the properties; and finally, we use a custom extractor to translate our implementation to the target language. Our approach is fine-grained: embeddings (and consequently extractors) do not have to cover the entire language, only the subset that is sufficient to specify the problem of interest. Also, instead of implementing the entire application in a theorem prover, one might chose a particular fragment and integrate it into an existing codebase. Our proposed approach has several advantages: • Firstly, working with a shallow embedding is much easier than working with a deep one. While deep embeddings provide full access to the program representation, they require us to encode the full syntax and typing rules of the target language, which soon becomes imprac- tical for embedded languages with dependent types [McBride 2010]. This matters because we use dependent types in the embedding to encode the properties of interest. In contrast, shallow embeddings do not require any additional encoding, since they are regular programs in the host language, and can make use of all available libraries and tools. • Secondly, specifying embeddings and extractions in the same dependently-typed framework means we can modify the extractor without rebuilding the whole compiler of the host language. Extractors can be fine-tuned for the particular application, e.g. treating a chosen func- tion in a specific way. Extractors can also make full use of dependent types, ensuring that the extraction process is sound. • Thirdly, since the embedded program and the extractor are developed side-by-side, we can make use of features of the host language to customise the behaviour of the extractor further. Concretely, we use Agda’s rewrite rules as a lightweight method of applying verified domain- specific optimizations during extraction. We implement extractors for Kaleidoskope [LLVM development team 2021] (Section 3) and SaC [Scholz 2003] (Section 4) in Agda. The former is a minimalist language that we use to demonstrate the basic extraction principles. The latter is a high-performance array language that generates efficient code for various architectures, but it has a restrictive type system. Our SaC embedding guaran- tees program termination, in-bound array indexing, and safety of arithmetic operations. Finally, we also embed a small subset of APL [Iverson 1962] into Agda (Section 5), which is sufficient to encode a Convolutional Neural Network [Šinkarovs et al. 2019]. Embedding APL is challenging because the language is untyped, and its basic operators are heavily overloaded. We define all the basic operators of APL in terms of the SaC embedding, effectively obtaining a compiler from APL into executable binaries. 1This use of the word ‘reflection’ is not to be confused with reflection in Java and similar languages, which is a form of runtime metaprogramming. Proc. ACM Program. Lang., Vol. 1, No. CONF, Article 1. Publication date: January 2018. 1:3 Contributions. The key contributions of this paper are: • We introduce the concept of reflection-based extractors for shallowly-embedded languages that are embedded in metaprogramming-capable theorem provers. • We extend the reflection API of Agda to make it better suited for defining custom extractors. Specifically, we extend the clause syntax with telescopes (Section 3.4), we add new operations for selective reduction (Section 3.5), and we implement optional reconstruction of datatype parameters (Section 4.2). • We implement extractors for two languages: Kaleidoscope (Section 3) and SaC (Section 4). We use the latter as a basis for embedding a subset of APL (Section 5). • We demonstrate how the proposed embeddings can be used by encoding a real-world application and ensuring that it extracts correctly (Section 5.2). 2 BACKGROUND We start with a brief overview of key Agda constructions that are used in this paper. We also present relevant parts of the reflection API. For a more in-depth introduction to Agda refer to the Agda user manual [Agda development team 2021]. 2.1 Agda Basics Agda is an implementation of Martin-Löf’s dependent type theory [Martin-Löf 1998] extended with many convenience constructions such as records, modules, do-notation, etc. Types are defined using the following syntax: data N : Set where data Fin : N → Set where data _≡_ {a}{A : Set a} zero : N zero : ∀ {n} → Fin (suc n) (x : A) : A → Set a where suc : N → N suc : ∀ {n} → Fin n → Fin (suc n) refl : x ≡ x Unary natural numbers N is a type with two constructors: zero and suc. Fin is an indexed type, where the index is of type N. Constructor names can be overloaded and are disambiguated from the typing context, or can be prefixed with the type name: N.zero, N.suc. The Fin n type represents natural numbers that are bounded by n. In the definition of Fin, ∀ binds the variable without need- ing to specify its type.

Arxiv:2105.10819V1 [Cs.PL] 22 May 2021 Eaestuck

Experience Implementing a Performant Category-Theory Library in Coq

Why Untyped Nonground Metaprogramming Is Not (Much Of) a Problem

Practical Reflection and Metaprogramming for Dependent

Testing Noninterference, Quickly

An Anti-Locally-Nameless Approach to Formalizing Quantifiers Olivier Laurent

Mathematics in the Computer

Metaprogramming with Julia

Xmonad in Coq: Programming a Window Manager in a Proof Assistant

A Survey of Engineering of Formally Verified Software

Metamathematics in Coq Metamathematica in Coq (Met Een Samenvatting in Het Nederlands)

A Foundation for Trait-Based Metaprogramming

Metaprogramming in .NET by Kevin Hazzard Jason Bock