Downloads/Doe-Public-Access-Plan)

Really Embedding Domain-Specific Languages into C++ Hal Finkel Alexander McCaskey Tobi Popoola Leadership Computing Facility Computer Science and Mathematics Division Boise State University Argonne National Laboratory Oak Ridge National Laboratory Boise, ID, USA Lemont, IL, USA Oak Ridge, TN, USA [email protected] hfi[email protected] [email protected] Dmitry Lyakh Johannes Doerfert Leadership Computing Facility Leadership Computing Facility Oak Ridge National Laboratory Argonne National Laboratory Oak Ridge, TN, USA Lemont, IL, USA [email protected] [email protected] Abstract—Domain-specific languages (DSLs) are both perva- integrated with the code is the rest of the project. The DSL- sive and powerful, but remain difficult to integrate into large generated code provides and/or uses interfaces provided by projects. As a result, while DSLs can bring distinct advantages the rest of the source code, sometimes making use of non- in performance, reliability, and maintainability, their use often involves trading off other good software-engineering practices. In trivial types defined elsewhere in the project. Some DSL’s this paper, we describe an extension to the Clang C++ compiler to (e.g., Lex, Yacc) allow the programmer to embed code in support syntax plugins, and we demonstrate how this mechanism a host, general-purpose programming language (e.g., C/C++) allows making use of DSLs inside of a C++ code base without into the DSL’s input source files. While relatively convenient, needing to separate the DSL source code from the surrounding and often necessary for performance, this increases the number C++ code. Index Terms—compilers, LLVM, Clang, C++, parsing, domain- of, and complexity of, dependencies between the DSL source specific languages and the rest of the project. Practically, integration of DSLs into project build systems requires effort, and sometimes, specialized skills. Most importantly, programmers often try to I. INTRODUCTION keep related functions and/or components together in the same Domain-specific languages play an important part in our source file to make the source code easier to understand. While modern software ecosystem. Domain-specific languages are being forced to keep the DSL’s input in files separate from commonly used to generate lexical analyzers (e.g., Lex [1], the rest of the project source code may seem to be a trivial re2c [2]) and parsers (e.g., Yacc [3], ANTLR [4]). In high- inconvenience, it’s not. A majority, 60-90%, of software- performance computing, DSLs are used to generate numerical development costs are associated with reading and navigation kernels (e.g., SPIRAL [5], TCE [6], LGen [7], Linnea [8], source code as part of maintenance tasks [13]. Consistent with TACO [9], Devito [10]), and many code bases use custom our experience, when examining how programmers use their code generators (e.g., the codelet generator in FFTW [11], editors to perform these kinds of tasks, 35% of their time, on Kranc in the Einstein Toolkit [12]). average, is spent navigating between dependencies [14]. Thus, arXiv:2010.08439v1 [cs.PL] 16 Oct 2020 Unfortunately, these DSLs are often difficult to integrate the productivity of using DSLs can be significantly improved into a larger software project. At its root, one significant source by reducing logically-undesirable partitioning of source code of difficulty is due to the need to provide the input to the between different files. DSL’s translator in one or more source files separate from In this paper, we describe a new technique for integrating the source files containing the bulk of the project’s source DSLs into C++ source code: the syntax plugin. The imple- code. The input to the DSL, however, needs to be somehow mentation is built on the Clang C++ compiler [15]. Clang can compile C++ code to executable form using the LLVM This manuscript has been authored by UT-Battelle, LLC under Contract No. compiler infrastructure [16], but can also directly perform DE-AC05-00OR22725 with the U.S. Department of Energy. The United States source-code analysis [17] and rewriting. While performing Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid- any of these tasks, Clang can make use of user-provided up, irrevocable, world-wide license to publish or reproduce the published form plugins. However, no existing plugin interface provides the of this manuscript, or allow others to do so, for United States Government ability to integrate DSLs as described here, and so as described purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access below, a new plugin interface was designed and implemented. Plan. (http://energy.gov/downloads/doe-public-access-plan). While we do not quantitatively evaluate productivity gains, clang++ -c source.cpp \\ #include"clang/Frontend/FrontendPluginRegistry.h" -fplugin=/path/to/somePlugin.so ...// other includes. using namespace clang; Fig. 1: An example showing how Clang can be invoked so that namespace{ it loads (from a shared library named somePluginLib.so) class ExampleASTConsumer : public ASTConsumer { CompilerInstance &Instance; and uses a plugin. public: ExampleASTConsumer(CompilerInstance &Instance) : Instance(Instance) {} or other factors affecting developer willingness to use DSLs, in this paper, it is our sense that the presented technique bool HandleTopLevelDecl(DeclGroupRef DG) override { increases developer productivity and otherwise lowers barriers // Do something to handlea new declaration. to adopting DSLs inside of C++ code bases. return true; The implementation of Clang with support for syn- } tax plugins is available from: https://github.com/hfinkel/ void HandleTranslationUnit(ASTContext& context) llvm-project-csp override { // Do something to handle the completion of The remainder of this paper is organized as follows: Sec- // the translation unit. tion II reviews Clang’s plugin support, Section III describes } the implementation of the syntax-plugin infrastructure within }; Clang, Section IV describes how syntax plugins enable em- class ExampleASTAction : public PluginASTAction { bedding TACO into C++, Section V describes embedding protected: programming languages for quantum computing into C++, std::unique_ptr<ASTConsumer> CreateASTConsumer( CompilerInstance &CI, llvm::StringRef) Section VI describes embedding a DSL for tensor computation override { into C++, and Section VII concludes with a discussion of return future directions. std::make_unique<PrintFunctionsConsumer>(CI); } bool ParseArgs( ACKGROUND LANG LUGINS II. B :C P const CompilerInstance &CI, const std::vector<std::string> &args) Clang plugins are loaded from shared libraries provided on override { Clang’s command line and are integrated into Clang’s source- // Handle custom plugin command-line arguments. return true; code processing. Figure 1 shows the relevant command-line } arguments. void PrintHelp(llvm::raw_ostream& ros) { A Clang plugin takes the form of a set of classes, each of ros <<"A help message goes here\n"; which derives from some appropriate Handler class, and each } of which is registered using an appropriate static object. Clang }; provides several kinds of handlers [18]: class ExamplePragmaHandler : public PragmaHandler { public: • PragmaHandler - Used by a plugin to provide new ExamplePragmaHandler() : kinds of pragmas. PragmaHandler("an_example"){} • ParsedAttrInfo - Used by a plugin to provide new void HandlePragma(Preprocessor &PP, kinds of attributes. PragmaIntroducer Introducer, • PluginASTAction - Used by a plugin to provide Token &PragmaTok) override { an AST listener, an object that can observe AST-node // Handle an encountered: //#pragma an_example more tokens creation events. } As illustrated in Figure 2, a single plugin can have multiple }// anonymous namespace handlers, including multiple kinds of handlers, and use all of static FrontendPluginRegistry::Add<ExampleASTAction> them together in order to provide its functionality. X("example-plugin","an example sketch"); It is worth noting that this interface is tied directly to Clang’s static PragmaHandlerRegistry:: AST data structures. This means that the plugins are not only Add<ExamplePragmaHandler> specific to Clang, but also, in practice, tied to the particular Y("example-plugin-cntrl","enable something"); version of Clang against which the plugin was compiled. Pro- gramming the plugin requires knowledge of the data structures Fig. 2: An example showing how Clang plugins are used by Clang’s lexical analysis and parsing infrastructure, structured. This example sketches a plugin using the along with Clang’s AST. For a more-complete example, we PragmaHandler interface to provide a new pragma and the refer the reader to examples/AnnotateFunctions in PluginASTAction to observe relevant AST-node creation Clang’s source repository [19]. events. III. CLANG SYNTAX PLUGINS #include"clang/Frontend/FrontendPluginRegistry.h" ...// other includes. None of the existing plugin interfaces allow the plugin to using namespace clang; alter the fundamental syntax accepted by Clang’s parser. Here namespace{ we explore a plugin interface that allows for exactly that: class ExampleSyntaxHandler : public SyntaxHandler { within a specifically-tagged function, the function body may public: contain code that is not valid C++ code. The code in the ExampleSyntaxHandler() : SyntaxHandler("example"){} function body must still obey

Load more