<<

GPP, the Generic

Tristan Miller Austrian Research Institute for Artificial Intelligence Freyung 6/3, 1010 Vienna, Austria ORCID: 0000-0002-0749-1100 Denis Auroux Department of Mathematics, Harvard University 1 Oxford Street, Cambridge, MA 02138, USA

Summary at providing even higher-level constructs, such as conditional loops and other control structures in In science, a preprocessor (or pro- (Meissner, 1975) and COBOL (Tri- cessor) is a tool that programatically alters its input, ance, 1980). The need for generalized, language- typically on the basis of inline annotations, to pro- independent tools was eventually recognized (McIl- duce data that serves as input for another program. roy, 1960), leading to the development of general- are used in software development and purpose preprocessors such as GPM (Strachey, 1965) document processing workflows to translate or ex- and ML/I (Brown, 1967). tend programming or markup languages, as well By the end of the 1960s, preprocessors had at- as for conditional or pattern-based generation of tracted a considerable amount of attention, by com- and text. Early preprocessors were rela- puting theorists and practitioners alike, and their tively simple string replacement tools that were tied use in software engineering had expanded beyond to specific programming languages and application the augmentation and adaptation of programming domains, and while these have since given rise to languages. A survey paper by Brown (1969) iden- more powerful, general-purpose tools, these often tified four broad application areas: language ex- require the user to learn and use complex macro tension, systematic searching and editing of source languages with their own syntactic conventions. In code, translation between programming languages, this paper, we present GPP, an extensible, general- and code generation (i.e., simplifying the writing of purpose preprocessor whose principal advantage is highly repetitive code, parameterizing a program by that its syntax and behaviour can be customized to substituting compile-time constants, or producing suit any given preprocessing task. This makes GPP variants of a program by conditionally including cer- of particular benefit to research applications, where tain statements or modules). While the first three of it can be easily adapted for use with novel markup, these application areas have largely been rendered programming, and control languages. obsolete by today’s integrated development envi- arXiv:2008.00840v1 [cs.PL] 3 Aug 2020 ronments and expressive, feature-rich programming languages, implementing software variability with Background language-specific and general-purpose preprocessors remains commonplace (Apel et al., 2013; Kstner Preprocessors date back to the mid-1950s, when et al., 2012). they were used to extend individual assembly lan- Text processing became another main application guages with constructs that would later be found in area for preprocessors, in particular to generate doc- high-level programming languages (Layzell, 1985). uments on the basis of user-specified conditions or These languages, in turn, fostered the development patterns, and to convert between document markup of yet more special-purpose preprocessors aimed languages (Walden, 2014). The earliest such uses

1

This is a preprint of the following publication: Tristan Miller and Denis Auroux. GPP, the generic preprocessor. Journal of Open Source Software, 5(51), July 2020. ISSN 2475-9066. DOI: 10.21105/joss.02400 were ad-hoc repurposings of – and Weinberg, 2020). While GPP is less powerful specific preprocessors to operate on human-readable than (Seindal et al., 2016), it is arguably more texts (Keese, 1964; Stallman and Weinberg, 2020); flexible, and supports all the operations ex- these were soon supplanted by text-specific macro pected of a modern, high-level preprocessing system, languages such as TRAC (Mooers and Deutsch, including conditional tests, arithmetic evaluation, 1965), which were positioned as tools for stenogra- and POSIX-style wildcard matching (“globbing”). phers and other writing professionals. More recently In addition to macros, GPP understands comments it has been common to use general-purpose prepro- and strings, whose syntax and behaviour can also cessors (Mailund, 2019; Pesch, 1992). be widely customized to fit any particular purpose.

Statement of Need GPP in research

Criticism of preprocessors commonly focuses on the GPP has already been integrated into a number of idiosyncratic languages they employ for their own third-party projects in basic and applied research. built-in directives and for users to define and in- These include the following: voke macros. The languages of early preprocessors were derided as “clumsy and restrictive” (Layzell, • The Waveform Definition Language (WDL) is 1985) and “hard to read” (Brown, 1969), and even Caltech Optical Observatories’ -like language modern preprocessors are sometimes attacked for for programming astronomical research cam- relying on “the clumsiness of a separate language eras. WDL uses GPP to preprocess configura- of limited expressiveness” (Ernst et al., 2002) or, tion files containing signals and parameters spe- at the other extreme, for being overly complicated, cific to the camera controllers, flags setting the quirky, opaque, or hard to learn, even for experi- devices’ operating modes and image properties, enced and markup users (Ernst et al., and timing rules. According to the develop- 2002; Paddon, 1993; Pesch, 1992). ers, GPP was chosen over the Our general-purpose preprocessor, GPP, avoids “for added flexibility and to avoid some C-like these issues by providing a lightweight but flexible limitations” (Kaye et al., 2017). macro language whose syntax can be customized • XSB is a research-oriented, commercial-grade by the user. The tool’s built-in presets allow its system and . directives to be made to resemble those of many The developers chose to GPP XSB’s de- popular languages, including HTML and T X. This E fault preprocessor because it “maintains a high greatly reduces the learning curve for GPP when it is degree of compatibility with the C preproces- used with these languages, eliminates the cognitive sor, but is more suitable for processing Prolog burden of repeatedly “mode switching” between programs” (Swift et al., 2017). source and preprocessor syntax when reading or composing, and allows existing syntax highlighters • C-Control Pro is a family of electronic mi- and other tools to process GPP directives with little crocontrollers produced by Conrad Electronic; or no further configuration. Furthermore, users are they are specifically designed for industrial and not limited to using these presets, but can fully automotive applications. The official software define their own syntax for GPP directives and development kit includes a modified version of macros. This makes GPP particularly attractive for GPP for use with the products’ BASIC and use in research and development, where its syntax Compact-C programming languages (Schirm can be readily adapted to match novel programming and Sprenger, 2007). and markup languages. GPP’s independence from any one programming • SUS is a tool that allows system administrators or makes it more versatile than to exercise fine-grained control over how users the C Preprocessor, which was formerly “abused” as can run commands with elevated privileges. It a general text processor and is still sometimes (inap- has a sophisticated control file syntax that is propriately) used for non-C applications (Stallman preprocessed with GPP (Gray, 2001).

2 Apart from these uses, GPP is occasionally cited as Dreiling, Alexander (July 2010). “Feature Mining: previous or related work in scholarly publications Semiautomatische Transition von (Alt-)Systemen on or compile-time variability of zu Software-Produktlinien”. Diploma thesis. software (Apel et al., 2013; Baxter and Mehlich, Fakultt fr Informatik, Institut fr Technische 2001; Behringer, 2017; Blendinger, 2010; Dreiling, und Betriebliche Informationssysteme, Otto-von- 2010; Kstner et al., 2012; Lotoreychik and Shopyrin, Guericke-Universitt Magdeburg. 2006; Zmiry, 2016). Ernst, Michael D., Greg J. Badros, and David Notkin (Dec. 2002). An Empirical Analysis of C Preprocessor Use. IEEE Transactions on Software Acknowledgments Engineering 28(12):1146–1170. issn: 0098-5589. doi: 10.1109/TSE.2002.1158288. Tristan Miller is supported by the Austrian Science Gray, Peter D. (Dec. 2001). SUS An Object Ref- Fund (FWF) under project M 2625-N31. Denis erence Model for Distributing UNIX Super User Auroux is partially supported by NSF grant DMS- Privileges. Proceedings of the LISA 2001 15th Sys- 1937869 and by Simons Foundation grant #385573. tems Administration Conference. The USENIX The Austrian Research Institute for Artificial Intelli- Association, pp. 15–18. gence is supported by the Austrian Federal Ministry Kstner, Christian et al. (June 2012). Type Checking for Science, Research and Economy. Annotation-Based Product Lines. ACM Trans- actions on Software Engineering and Methodol- References ogy 21(3):14:1–14:39. doi: 10 . 1145 / 2211616 . 2211617. Apel, Sven et al. (Oct. 2013). Classic, Tool-Driven Kaye, Stephen et al. (2017). Waveform Definition Variability Mechanisms. Feature-Oriented Soft- Language. Tech. rep. Pasadena, CA: Caltech Op- ware Product Lines. Berlin/Heidelberg: Springer- tical Observatories. Verlag. isbn: 978-3-642-37520-0. doi: 10.1007/ Keese Jr., W. M. (Sept. 1964). A Note on Auto- 978-3-642-37521-7_5. matic Generation of Documentation by Macro Baxter, Ira D. and Michael Mehlich (2001). Pre- Assemblers. Technical memorandum TM-64-1031- processor Conditional Removal by Simple Partial 1. Washington, : Bellcom, Inc. Evaluation. Proceedings of the 8th Working Con- Layzell, P. J. (Jan. 1985). The History of Macro ference on Reverse Engineering. IEEE, pp. 281– Processors in Programming Language Extensi- 290. isbn: 0-7695-1303-4. doi: 10.1109/WCRE. bility. The Computer Journal 28(1):29–33. issn: 2001.957833. 0010-4620. doi: 10.1093/comjnl/28.1.29. Behringer, Benjamin (July 2017). “Projectional Lotoreychik, V. Yu. and D. G. Shopyrin (2006). Editing of Software Product Lines The PEoPL Metaprogrammirovaniye na osnove tekstovogo Approach”. PhD thesis. Faculty of Sciences, Tech- preprotsessora [Text PreprocessorBased Metapro- nology and Communication, Universit de Luxem- gramming]. Nauchno-Tehnicheskii Vestnik Infor- bourg. matsionnykh Tekhnologii, Mekhaniki i Optiki [Sci- Blendinger, Frank (Aug. 2010). “A Filesystem- entific and Technical Journal of Information Tech- Based Approach to Support Product Line De- nologies, Mechanics and Optics] 6(2):57–65. issn: velopment with Editable Views”. Diploma Thesis. 2226-1494. Department of Computer Sciences 4, Friedrich- Mailund, Thomas (2019). Preprocessing. Introduc- Alexander University Erlangen-Nuremberg. ing Markdown and Pandoc: Using Markup Lan- Brown, P. J. (Oct. 1967). The ML/I Macro Proces- guage and Document Converter. Berkeley, CA: sor. Communications of the ACM 10(10):618–623. Apress. isbn: 978-1-4842-5148-5. doi: 10.1007/ issn: 0001-0782. doi: 10.1145/363717.363746. 978-1-4842-5149-2_10. Brown, P. J. (1969). A Survey of Macro Proces- McIlroy, M. Douglas (Apr. 1960). Macro Instruction sors. Annual Review in Extensions of Compiler Languages. Communica- 6:37–88. issn: 0066-4138. doi: 10.1016/0066- tions of the ACM 3(4):214–220. issn: 0001-0782. 4138(69)90001-9. doi: 10.1145/367177.367223.

3 Meissner, Loren P. (Sept. 1975). On Extending For- tran Control Structures to Facilitate . SIGPLAN Notices 10(9):19–30. issn: 0362-1340. doi: 10.1145/987316.987320. Mooers, Calvin N. and L. Peter Deutsch (Aug. 1965). TRAC, a Text-Handling Language. ACM ’65: Proceedings of the 20th National Conference. Ed. by Lewis Winner. New York: Association for Com- puting Machinery, pp. 229–246. isbn: 978-1-4503- 7495-8. doi: 10.1145/800197.806048. Paddon, Michael (1993). Shake: A Portable Tool for Generating Makefiles. AUUG ’93 Conference Proceedings. Kensington, NSW, Australia: AUUG Inc., pp. 145–156. Pesch, . H. (1992). Configurable Manuals. Confer- ence Record on Crossing Frontiers, pp. 776–780. isbn: 0-7803-0788-7. doi: 10.1109/IPCC.1992. 673146. Schirm, Reiner and Peter Sprenger (2007). Der Preprozessor. Messen, Steuern und Regeln mit C-Control Pro: Praxisanwendungen, Schaltung- stechnik und Programmierung. Poing, Germany: Franzis. isbn: 978-3-7723-4097-0. Seindal, Ren et al. (Dec. 2016). GNU M4, Version 1.4.18. Free Software Foundation. Stallman, Richard M. and Zachary Weinberg (2020). Overview. The C Preprocessor. GCC 10.1.0. Free Software Foundation. Strachey, C. (Jan. 1965). A General Purpose Macro- generator. The Computer Journal 8(3):225–241. issn: 0010-4620. doi: 10.1093/comjnl/8.3.225. Swift, Theresa et al. (Oct. 2017). The XSB System, Version 3.8.x, Volume 1: ’s Manual. Triance, J. M. (1980). Structured Programming in COBOLThe Current Options. The Computer Journal 23(3):194–200. doi: 10.1093/comjnl/ 23.3.194. Walden, David (2014). Macro Memories, 19642013. TUGboat: The Communications of the TEX Users Group 35(1):99–110. Zmiry, Iddo E. (Apr. 2016). “Lola 0.064: A Pro- gramming Language for Augmenting Program- ming Languages”. MA thesis. Technion Israel Institute of Technology.

4