C0, an Imperative Programming Language for Novice Computer Scientists

C0, an Imperative Programming Language for Novice Computer Scientists Rob Arnold CMU-CS-10-145 December 2010 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Thesis Committee: Frank Pfenning, Chair Mark Stehlik Submitted in partial fulfillment of the requirements for the degree of Master of Science. Copyright c 2010 Rob Arnold This work was partially supported by the Computational Thinking Center at Carnegie Mellon University, sponsored by Microsoft Research. Keywords: c, programming language, education Abstract The primary goal of a computer science course on data structures and algorithms is to educate students on the fundamental abstract concepts and expose them to con- crete examples using programming assignments. For these assignments, the programming language should not dominate the assignment but instead complement it so that students spend most of their time engaged in learning the course material. Choosing an appropriate programming language requires consideration not just of the course material and the programs students will write but also their prior programming experience and the position of the course in the course sequence for their degree program. We have developed a new programming language, C0, designed for Carnegie Mellon’s new Principles of Imperative Programming course. C0 is almost a subset of C, eschewing complex semantics and undefined or unspecified behaviors in favor of simple semantics and formally specified behavior. It provides features such as garbage collection and array bounds checks to aid students in developing and reasoning about their programs. In addition to the compiler and runtime, we have also developed a small set of libraries for students to use in their assignments. iv Acknowledgments There are many people who supported me in my work. First, I would like to thank my advisor Frank Pfenning for his guidance, feedback, and funding this past year. I would also like to thank Mark Stehlik for serving on my thesis committee. Rob Simmons, William Lovas, and Roger Wolff were willing at all times to discuss all ideas and implementation ideas. Michael Sullivan, Max Buevich, and Dan Schafer also gave me good feedback during the development of C0. Their feedback was invaluable in helping me in this endeavor. I want to extend a special thanks to Deborah Cavlovich for helping me sort out all the administrative details and issues. Finally I want to thank my parents, sister and grandmother for their support of my education over the years. vi Contents 1 Introduction 1 1.1 Motivation...................................... 1 1.2 LanguageRequirements ............................. 2 1.2.1 Simplicity.................................. 3 1.2.2 FriendlyErrorMessages . 5 1.2.3 Similarity to a Real Industrial Language . ...... 5 2 Language Description 7 2.1 Basicdatatypesandexpressions . ..... 7 2.1.1 Booleans .................................. 7 2.1.2 Integers................................... 8 2.1.3 StringsandCharacters . 9 2.1.4 Comparisons ................................ 9 2.2 Variables&Statements . 10 2.3 Functions ...................................... 11 2.4 Conditionals..................................... 13 2.5 TheHeap&Pointers ................................ 14 2.6 Arrays........................................ 15 2.7 Structures ...................................... 16 2.8 Typedef ....................................... 17 2.9 Reasoning...................................... 17 2.10 ComparisonwithC ................................. 18 2.10.1 Differences ................................. 18 2.10.2 UnionSupport ............................... 19 2.10.3 Lackof&forlocalvariables . 21 2.11Analysis....................................... 21 3 Properties 23 3.1 Concretesyntax................................... 23 3.1.1 Abstractsyntax............................... 27 3.2 Elaboration to Abstract Syntax . ...... 30 3.3 Proofofsafety................................... 31 vii 4 Implementation 33 4.1 Compiler ...................................... 33 4.2 Runtimes ...................................... 34 5 Standard Libraries 35 5.1 DesignPrinciples ................................ 35 5.2 Design........................................ 35 5.2.1 Input/Output ................................ 36 5.2.2 DataManipulation ............................. 36 5.2.3 Images ................................... 37 5.2.4 Windowing ................................. 37 5.2.5 Graphics .................................. 37 5.3 Implementation .................................. 37 5.3.1 NativeDependencies . 37 6 Related and Future Work 39 6.1 Relatedwork .................................... 39 6.2 Futureworkandspeculation . 39 6.2.1 Static prover for assertions . 39 6.2.2 Module system & Polymorphism . 40 6.2.3 Precisecollector .............................. 40 6.2.4 Dynamically check explicit deallocation . ........ 40 6.2.5 Convenient printf support ........................ 40 6.2.6 Formatting/style checker . 40 6.2.7 Interactivedebugger . 41 6.2.8 Feedback on the course and its use of C0 ................. 41 A Language Definition 43 A.1 Syntax........................................ 43 A.2 Rules ........................................ 47 A.3 Safety ........................................ 62 A.3.1 Progress................................... 62 A.3.2 Preservation................................. 62 A.4 Lemmas....................................... 62 A.5 Proofs........................................ 65 A.5.1 Progress................................... 65 A.5.2 Preservation................................. 73 B Standard Libraries 103 B.1 Input/Output..................................... 103 B.1.1 conio ...................................103 B.1.2 file ....................................103 B.1.3 args ....................................104 B.2 Datamanipulation................................. 105 viii B.2.1 parse ...................................105 B.2.2 string ..................................105 B.3 Images........................................107 B.3.1 img .....................................107 C Code Listing 111 C.1 Sample c0defs.h .................................111 C.2 C0 runtimeinterface ................................111 Bibliography 113 ix x Chapter 1 Introduction Designing a new language is not something one does lightly. There are hundreds of existing languages to choose from with many different properties so it seems reasonable to assume that there must exist some language suitable for teaching introductory data structures. Many of these langauges were not designed with pedagogical purposes in mind. They may have historical or designer quirks and features that no longer (or never did) make sense. They are not usually designed with novice programmers in mind but to accomplish some particular task or set of tasks. In the case of Carnegie Mellon’s introductory data structures course, a suitable language would need to be easily accessible by programmers with minimal experience and it must be able to adequately express the data structures used in the course. 1.1 Motivation Carnegie Mellon is revising its early undergraduate curriculum. In particular, they are bringing Jeanette Wing’s [10] idea of computational thinking to both majors and nonmajors, increase focus on software reliability and integrate parallelism into the core curriculum. These changes involve restructuring the course sequence and content. The first course, 15-110, in the sequence is intended for nonmajors and majors with no programming experience. It is intended to be an introductory computer science course containing the basic principles including an emphasis on computational thinking. The next course, 15-122, focuses on writing imperative programs with fundamental data structures and algorithms and reasoning about them. Students also take a functional programming course, 15-150, which focuses on working with immutable data and the concepts of modularity and abstraction. Following 15- 122 and 15-150, students take 15-210 where they continue to learn more advanced data structures as in 15-122 though with an additional emphasis on parallel programming. 15-213 remains an introductory systems course and 15-214 focuses on designing large-scale systems and includes content on object-oriented programming and distributed concurrency. For this thesis, we focus on 15-122 and the choice of programming language for its assignments. 1 1.2 Language Requirements To formulate the requirements, we must first examine the types of programs that students will be assigned to write for the course. The assignments are intended to assess and educate students according to the learning goals of the course. The majority of students taking 15-122 will have scored well on the AP test which currently uses Java. Those that have not will have taken a semester-long basic programming course taught using Python as the primary language for assignments. Though students should be able to write basic programs upon entry to this course, we do not assume that they will have had much experience in debugging or using the intermediate or advanced features of either language. This course is intended to instruct students in the basic data structures and algorithms used in computer science. This includes: • Stacks and Queues • Binary Search • Sorting • Priority queues • Graph traversal / shortest path • Balanced binary trees • Game tree search • String searching By the end of the course, students should be able to: • For the aforementioned data structures,

Load more