
ChocoPy: A Programming Language for Compilers Courses Rohan Padhye Koushik Sen Paul N. Hilfinger [email protected] [email protected] [email protected] University of California, Berkeley University of California, Berkeley University of California, Berkeley USA USA USA Abstract 1 Introduction ChocoPy is a programming language designed for teaching ChocoPy is a programming language designed for classroom an undergraduate course on programming languages and use. It is currently used at UC Berkeley to teach CS164, an compilers. ChocoPy is a restricted subset of Python 3.6, using undergraduate-level course on the design and implemen- static type annotations to enforce compile-time type safety. tation of programming languages. ChocoPy is a statically ChocoPy is fully specified using formal grammar, typing typed, restricted subset of Python 3.6. ChocoPy is fully spec- rules, and operational semantics. Valid ChocoPy programs ified using formal descriptions of syntax, typing, andop- can be executed in a standard Python interpreter, producing erational semantics. Students taking CS164 learn to reason results consistent with ChocoPy semantics. A major com- about such formalisms, consider alternatives, and extend ponent of CS164 at UC Berkeley is the project: students de- them to support additional features of Python. Over the velop a full compiler for ChocoPy, targeting RISC-V, in about course of a semester, students work in teams to develop a twelve weeks. In other exercises, students extend the syntax, full compiler for ChocoPy, targeting the RISC-V architec- type system, and formal semantics to support additional fea- ture. We provide to students a language reference manual, tures of Python. In this paper, we outline (1) the motivations a RISC-V implementation guide, and skeleton code for de- for creating the ChocoPy project, (2) salient features of the veloping a modular ChocoPy compiler in Java. Additional language, (3) the resources provided to students to develop resources include an instructor-provided reference compiler, their compiler, (4) some insights gained from teaching two a web-based ChocoPy IDE, a web-based RISC-V simulator semesters of ChocoPy-based courses by different instructors. with step-through assembly debugging, and an auto-grader. Our assignment resources are available for re-use by other The ChocoPy compiler project is developed in three modular instructors and institutions. stages that can be tested and graded independently of each other. The ChocoPy project is designed to be portable. All CCS Concepts • Social and professional topics → Com- ChocoPy resources are available at https://chocopy.org. puter science education; • Software and its engineer- ing → Compilers; Context specific languages. 2 Motivation Keywords Compilers courses, Python, RISC-V Waite [11] describes three common strategies for teaching an undergraduate compilers course: (1) software project, (2) ACM Reference Format: application of theory, (3) support for communicating with a Rohan Padhye, Koushik Sen, and Paul N. Hilfinger. 2019. ChocoPy: compiler. At UC Berkeley, the CS164 course is a mix of the A Programming Language for Compilers Courses. In Proceedings first two strategies. Students are exposed to various theoreti- of the 2019 ACM SIGPLAN SPLASH-E Symposium (SPLASH-E ’19), cal concepts underpinning programming language design— October 25, 2019, Athens, Greece. ACM, New York, NY, USA,5 pages. https://doi.org/10.1145/3358711.3361627 such as syntax, type systems, and formal semantics—as well as their efficient implementation—such as lexing, parsing, type checking, program analysis and optimization, code gen- Permission to make digital or hard copies of all or part of this work for eration, and memory management. These concepts concur- personal or classroom use is granted without fee provided that copies rently tie in to a semester-long project on developing a full are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights compiler for a small but non-trivial programming language. for components of this work owned by others than the author(s) must For students, developing a compiler end-to-end can be a be honored. Abstracting with credit is permitted. To copy otherwise, or hugely rewarding task. The rewards are both in the form of republish, to post on servers or to redistribute to lists, requires prior specific a substantial software engineering experience—the project permission and/or a fee. Request permissions from [email protected]. is often the largest that students have undertaken so far— SPLASH-E ’19, October 25, 2019, Athens, Greece as well as in gaining insights about programming language © 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM. design and implementation. ACM ISBN 978-1-4503-6989-3/19/10...$15.00 The major design decision for instructors is what pro- https://doi.org/10.1145/3358711.3361627 gramming language to base the compiler project on. The 41 SPLASH-E ’19, October 25, 2019, Athens, Greece Rohan Padhye, Koushik Sen, and Paul N. Hilfinger advantage of using a pedagogical language such as COOL [2] adapted to conform to Python 3.6. See §3 for more or SOOL [4], is the availability of formal specifications for details. a set of language features designed specifically for educa- 3. The ChocoPy reference manual contains a formal spec- tion. One of the co-authors of this paper has implemented ification of the language’s syntax (tokenization rules+ this approach for several years. However, students are not grammar), comprehensive typing rules for a type sys- always motivated to learn about the design of or write a tem based on nominal subtyping, as well as operational compiler for a language having unfamiliar syntax or whose semantics for all language constructs. Homework ex- features are not relatable to languages they already know; ercises lead students towards extending the syntax, this phenomenon has been observed by other instructors as typing rules, and formal semantics to support addi- well [1, 11]. An alternate approach is to specify a subset of tional Python features such as exceptions, dictionaries, a popular language, such as Java [3, 6, 8] or C [7]. Another list comprehensions, closures, etc. co-author of this paper has had success with this approach in 4. We provide resources to aid students in implementing the past. However, existing subset definitions either did not a compiler for ChocoPy that targets the 32-bit RISC-V have formal semantics attached to them, were not sufficiently instruction-set architecture [12]. In particular, we pro- complex enough for reasoning about language design issues, vide infrastructure to emit auto-commented assembly and/or did not have associated resources for developing a code that conforms to conventions listed in an accom- compiler targeting a modern instruction-set architecture. panying implementation guide. We also make use of a The development of the ChocoPy project was motivated RISC-V simulator, which allows step-through debug- by the following design goals: ging of RISC-V assembly in a web browser. See §4 for more details. 1. We wanted to use a subset of a widely used program- 5. All our artifacts are being made available as a package ming language, preferably one which students are al- of three assignments, complete with documentation ready familiar with. and auto-graders, for re-use by other instructors upon 2. We wanted the language to be expressive enough to request. write non-trivial programs in. In particular, we wanted to support an object-oriented paradigm with sufficient We soon discovered an additional advantage of basing a complexity to illustrate important nuances of static compiler project around a type-safe subset of a highly dy- type checking and efficient code generation. We de- namic language such as Python. On non-trivial benchmarks, cided to use the features of COOL as a reference. student-implemented compilers can easily outperform the 3. We wanted to use a language whose syntax, type- official Python implementation! We found this to be anex- checking rules, and operational semantics were for- cellent source of motivation for students. mally specified. These concepts tie the theory compo- nent taught in class to practical aspects of compiler 3 The ChocoPy Language development. ChocoPy is designed to be a subset of Python. An execution 4. We wanted the students’ compilers to target a mod- of a valid ChocoPy program that does not result in a run- ern assembly language, while providing sufficient tool time error should produce the same observable result as the support for simplifying such a daunting task. execution of that program in a standard Python interpreter. 5. We wanted to produce artifacts that can be re-used Program statements can contain expressions, assignments, across instructors and institutions offering compilers and control-flow statements such as conditionals and loops. courses. Evaluation of an expression results in a value that can be an ChocoPy fulfills these goals in the following way: integer, a boolean, a string, an object of a user-defined class, a list, or the special value None. ChocoPy does not support 1. ChocoPy is a statically typed subset of Python 3.6. We dictionaries, first-class functions, and reflective introspec- found that, as of 2018, most students taking CS164 tion. All expressions are statically typed. Variables (global were already familiar with writing Python programs.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages5 Page
-
File Size-