Computer Science Education: Where Are the Software Engineers of Tomorrow?

Total Page:16

File Type:pdf, Size:1020Kb

Computer Science Education: Where Are the Software Engineers of Tomorrow? Open Forum Computer Science Education: Where Are the Software Engineers of Tomorrow? By Dr. Robert B.K. Dewar and Dr. Edmond Schonberg AdaCore Inc. It is our view that Computer Science (CS) education is neglecting basic skills, in particular in the areas of programming and formal methods. We consider that the general adoption of Java as a first programming language is in part responsible for this decline. We examine briefly the set of programming skills that should be part of every software professional’s repertoire. t is all about programming! Over the last the methods (and the hardware) of the take MatLab to be the universal program- few years we have noticed worrisome time prevented these techniques from ming tool and ignore the topic altogether. trendsI in CS education. The following rep- becoming widespread, and as a result they resents a summary of those trends: are more or less ignored by most CS pro- The Pitfalls of Java as a First grams. This is unfortunate because the Programming Language 1. Mathematics requirements in CS pro- techniques have evolved to the point that Because of its popularity in the context of grams are shrinking. they can be used in large-scale systems and Web applications and the ease with which 2. The development of programming can contribute substantially to the reliabili- beginners can produce graphical programs, skills in several languages is giving way ty of these systems. A case in point is the Java has become the most widely used lan- to cookbook approaches using large use of SPARK in the re-engineering of the guage in introductory programming cours- libraries and special-purpose packages. ground-based air traffic control system in es.We consider this to be a misguided 3. The resulting set of skills is insufficient the United Kingdom (see a description of attempt to make programming more fun, for today’s software industry (in partic- iFACTS – Interim Future Area Control perhaps in reaction to the drop in CS ular for safety and security purposes) Tools Support, at <www.nats.co.uk/arti- enrollments that followed the dot-com and, unfortunately, matches well what cle/90>). SPARK is a subset of Ada aug- bust. What we observed at New York the outsourcing industry can offer.We mented with assertions that allow the University is that the Java programming are training easily replaceable profes- designer to prove important properties of courses did not prepare our students for sionals. a program: termination, absence of run- the first course in systems, much less for time exceptions, finite memory usage, etc. more advanced ones. Students found it These trends are visible in the latest [2]. It is obvious that this kind of design hard to write programs that did not have a curriculum recommendations from the and analysis methodology (dubbed graphic interface, had no feeling for the Association for Computing Machinery Correctness by Construction) will add sub- relationship between the source program (ACM). Curriculum 2005 does not mention stantially to the reliability of a system and what the hardware would actually do, mathematical prerequisites at all, and it whose design has involved SPARK from and (most damaging) did not understand mentions only one course in the theory of the beginning. However, PRAXIS, the the semantics of pointers at all, which programming languages [1]. company that developed SPARK and made the use of C in systems program- We have seen these developments from which is designing iFACTS, finds it hard to ming very challenging. both sides: As faculty members at New recruit people with the required mathemat- Let us propose the following principle: York University for decades, we have ical competence (and this is present even in The irresistible beauty of programming regretted the introduction of Java as a first the United Kingdom, where formal meth- consists in the reduction of complex for- language of instruction for most computer ods are more widely taught and used than mal processes to a very small set of primi- science majors. We have seen how this in the United States). tive operations. Java, instead of exposing choice has weakened the formation of our Another formal approach to which CS this beauty, encourages the programmer to students, as reflected in their performance students need exposure is model checking approach problem-solving like a plumber in systems and architecture courses. As and linear temporal logic for the design of in a hardware store: by rummaging through founders of a company that specializes in concurrent systems. For a modern discus- a multitude of drawers (i.e. packages) we Ada programming tools for mission-critical sion of the topic, which is central to mis- will end up finding some gadget (i.e. class) systems, we find it harder to recruit quali- sion-critical software, see [3]. that does roughly what we want. How it fied applicants who have the right founda- Another area of computer science does it is not interesting! The result is a stu- tional skills. We want to advocate a more which we find neglected is the study of dent who knows how to put a simple pro- rigorous formation, in which formal meth- floating-point computations. At New York gram together, but does not know how to ods are introduced early on, and program- University, a course in numerical methods program. A further pitfall of the early use ming languages play a central role in CS and floating-point computing used to be of Java libraries and frameworks is that it is education. required, but this requirement was dropped impossible for the student to develop a Formal Methods and Software many years ago, and now very few students sense of the run-time cost of what is writ- take this course. The topic is vital to all sci- ten because it is extremely hard to know Construction entific and engineering software and is what any method call will eventually exe- Formal techniques for proving the correct- semantically delicate. One would imagine cute. A lucid analysis of the problem is pre- ness of programs were an extremely active that it would be a required part of all cours- sented in [4]. subject of research 20 years ago. However, es in scientific computing, but these often We are seeing some backlash to this 28 CROSSTALK The Journal of Defense Software Engineering January 2008 Computer Science Education: Where Are the Software Engineers of Tomorrow? approach. For example, Bjarne Stroustrup tree manipulation libraries in Ada, and 1. An understanding of concurrent pro- reports from Texas A & M University that garbage collection in anything but Java. gramming (for which threads provide a the industry is showing increasing unhappi- The study of a wide variety of languages is, basic low-level model). ness with the results of this approach. thus, indispensable to the well-rounded 2. Reflection, namely the understanding Specifically, he notes the following: programmer. that a program can be instrumented to examine its own state and to determine I have had a lot of complaints about Why C Matters its own behavior in a dynamically that [the use of Java as a first pro- C is the low-level language that everyone changing environment. gramming language] from industry, must know. It can be seen as a portable specifically from AT&T, IBM, Intel, assembly language, and as such it exposes Why Ada Matters Bloomberg, NI, Microsoft, Lock- the underlying machine and forces the stu- Ada is the language of software engineer- heed-Martin, and more. [5] dent to understand clearly the relationship ing par excellence. Even when it is not the between software and hardware. Perfor- language of instruction in programming He noted in a private discussion on this mance analysis is more straightforward, courses, it is the language chosen to teach topic, reporting the following: because the cost of every software state- courses in software engineering. This is ment is clear. Finally, compilers (GCC for because the notions of strong typing, It [Texas A&M] did [teach Java as example) make it easy to examine the gen- encapsulation, information hiding, concur- the first language]. Then I started erated assembly code, which is an excellent rency, generic programming, inheritance, teaching C++ to the electrical engi- tool for understanding machine language and so on, are embodied in specific fea- neers and when the EE students and architecture. tures of the language. From our experience started to out-program the CS stu- and that of our customers, we can say that Why C++ Matters dents, the CS department switched a real programmer writes Ada in any lan- to C++. [5] C++ brings to C the fundamental concepts guage. For example, an Ada programmer of modern software engineering: encapsu- accustomed to Ada’s package model, which It will be interesting to see how many lation with classes and namespaces, infor- strongly separates specification from departments follow this trend. At mation hiding through protected and pri- implementation, will tend to write C in a AdaCore, we are certainly aware of many vate data and operations, programming by style where well-commented header files universities that have adopted Ada as a first extension through virtual methods and act in somewhat the same way as package language because of similar concerns. derived classes, etc. C++ also pushes stor- specs in Ada. The programmer will include age management as far as it can go without bounds checking and consistency checks A Real Programmer Can full-blown garbage collection, with con- when passing mutable structures between Write in Any Language (C, structors and destructors. subprograms to mimic the strong-typing Java, Lisp,Ada) Why Lisp Matters checks that Ada mandates [6].
Recommended publications
  • Bounds Checking on GPU
    Noname manuscript No. (will be inserted by the editor) Bounds Checking on GPU Troels Henriksen Received: date / Accepted: date Abstract We present a simple compilation strategy for safety-checking array indexing in high-level languages on GPUs. Our technique does not depend on hardware support for abnormal termination, and is designed to be efficient in the non-failing case. We rely on certain properties of array languages, namely the absence of arbitrary cross-thread communication, to ensure well-defined execution in the presence of failures. We have implemented our technique in the compiler for the functional array language Futhark, and an empirical eval- uation on 19 benchmarks shows that the geometric mean overhead of checking array indexes is respectively 4% and 6% on two different GPUs. Keywords GPU · functional programming · compilers 1 Introduction Programming languages can be divided roughly into two categories: unsafe languages, where programming errors can lead to unpredictable results at run- time; and safe languages, where all risky operations are guarded by run-time checks. Consider array indexing, where an invalid index will lead an unsafe lan- guage to read from an invalid memory address. At best, the operating system will stop the program, but at worst, the program will silently produce invalid results. A safe language will perform bounds checking to verify that the array index is within the bounds of the array, and if not, signal that something is amiss. Some languages perform an abnormal termination of the program and print an error message pointing to the offending program statement. Other languages throw an exception, allowing the problem to be handled by the pro- gram itself.
    [Show full text]
  • SETL for Internet Data Processing
    SETL for Internet Data Processing by David Bacon A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Computer Science New York University January, 2000 Jacob T. Schwartz (Dissertation Advisor) c David Bacon, 1999 Permission to reproduce this work in whole or in part for non-commercial purposes is hereby granted, provided that this notice and the reference http://www.cs.nyu.edu/bacon/phd-thesis/ remain prominently attached to the copied text. Excerpts less than one PostScript page long may be quoted without the requirement to include this notice, but must attach a bibliographic citation that mentions the author’s name, the title and year of this disser- tation, and New York University. For my children ii Acknowledgments First of all, I would like to thank my advisor, Jack Schwartz, for his support and encour- agement. I am also grateful to Ed Schonberg and Robert Dewar for many interesting and helpful discussions, particularly during my early days at NYU. Terry Boult (of Lehigh University) and Richard Wallace have contributed materially to my later work on SETL through grants from the NSF and from ARPA. Finally, I am indebted to my parents, who gave me the strength and will to bring this labor of love to what I hope will be a propitious beginning. iii Preface Colin Broughton, a colleague in Edmonton, Canada, first made me aware of SETL in 1980, when he saw the heavy use I was making of associative tables in SPITBOL for data processing in a protein X-ray crystallography laboratory.
    [Show full text]
  • Modern Programming Languages CS508 Virtual University of Pakistan
    Modern Programming Languages (CS508) VU Modern Programming Languages CS508 Virtual University of Pakistan Leaders in Education Technology 1 © Copyright Virtual University of Pakistan Modern Programming Languages (CS508) VU TABLE of CONTENTS Course Objectives...........................................................................................................................4 Introduction and Historical Background (Lecture 1-8)..............................................................5 Language Evaluation Criterion.....................................................................................................6 Language Evaluation Criterion...................................................................................................15 An Introduction to SNOBOL (Lecture 9-12).............................................................................32 Ada Programming Language: An Introduction (Lecture 13-17).............................................45 LISP Programming Language: An Introduction (Lecture 18-21)...........................................63 PROLOG - Programming in Logic (Lecture 22-26) .................................................................77 Java Programming Language (Lecture 27-30)..........................................................................92 C# Programming Language (Lecture 31-34) ...........................................................................111 PHP – Personal Home Page PHP: Hypertext Preprocessor (Lecture 35-37)........................129 Modern Programming Languages-JavaScript
    [Show full text]
  • Preconditions/Postconditions Author: Robert Dewar Abstract: Ada Gem
    Gem #31: preconditions/postconditions Author: Robert Dewar Abstract: Ada Gem #31 — The notion of preconditions and postconditions is an old one. A precondition is a condition that must be true before a section of code is executed, and a postcondition is a condition that must be true after the section of code is executed. Let’s get started… The notion of preconditions and postconditions is an old one. A precondition is a condition that must be true before a section of code is executed, and a postcondition is a condition that must be true after the section of code is executed. In the context we are talking about here, the section of code will always be a subprogram. Preconditions are conditions that must be guaranteed by the caller before the call, and postconditions are results guaranteed by the subprogram code itself. It is possible, using pragma Assert (as defined in Ada 2005, and as implemented in all versions of GNAT), to approximate run-time checks corresponding to preconditions and postconditions by placing assertion pragmas in the body of the subprogram, but there are several problems with that approach: 1. The assertions are not visible in the spec, and preconditions and postconditions are logically a part of (in fact, an important part of) the spec. 2. Postconditions have to be repeated at every exit point. 3. Postconditions often refer to the original value of a parameter on entry or the result of a function, and there is no easy way to do that in an assertion. The latest versions of GNAT implement two pragmas, Precondition and Postcondition, that deal with all three problems in a convenient way.
    [Show full text]
  • Cmsc330 Cybersecurity
    cmsc330 Cybersecurity Cybersecurity Breaches Major security breaches of computer systems are a fact of life. They affect companies, governments, and individuals. Focusing on breaches of individuals' information, consider just a few examples: Equifax (2017) - 145 million consumers’ records Adobe (2013) - 150 million records, 38 million users eBay (2014) - 145 million records Anthem (2014) - Records of 80 million customers Target (2013) - 110 million records Heartland (2008) - 160 million records Vulnerabilities: Security-relevant Defects The causes of security breaches are varied but many of them, including those given above, owe to a defect (or bug) or design flaw in a targeted computer system's software. The software problem can be exploited by an attacker. An exploit is a particular, cleverly crafted input, or a series of (usually unintuitive) interactions with the system, which trigger the bug or flaw in a way that helps the attacker. Kinds of Vulnerability One obvious sort of vulnerability is a bug in security policy enforcement code. For example, suppose you are implementing an operating system and you have code to enforce access control policies on files. This is the code that makes sure that if Alice's policy says that only she is allowed to edit certain files, then user Bob will not be allowed to modify them. Perhaps your enforcement code failed to consider a corner case, and as a result Bob is able to write those files even though Alice's policy says he shouldn't. This is a vulnerability. A more surprising sort of vulnerability is a bug in code that seems to have nothing to do with enforcing security all.
    [Show full text]
  • Understanding Programming Languages
    Understanding Programming Languages M. Ben-Ari Weizmann Institute of Science Originally published by John Wiley & Sons, Chichester, 1996. Copyright °c 2006 by M. Ben-Ari. You may download, display and print one copy for your personal use in non-commercial academic research and teaching. Instructors in non-commerical academic institutions may make one copy for each student in his/her class. All other rights reserved. In particular, posting this document on web sites is prohibited without the express permission of the author. Contents Preface xi I Introduction to Programming Languages 1 1 What Are Programming Languages? 2 1.1 The wrong question . 2 1.2 Imperative languages . 4 1.3 Data-oriented languages . 7 1.4 Object-oriented languages . 11 1.5 Non-imperative languages . 12 1.6 Standardization . 13 1.7 Computer architecture . 13 1.8 * Computability . 16 1.9 Exercises . 17 2 Elements of Programming Languages 18 2.1 Syntax . 18 2.2 * Semantics . 20 2.3 Data . 21 2.4 The assignment statement . 22 2.5 Type checking . 23 2.6 Control statements . 24 2.7 Subprograms . 24 2.8 Modules . 25 2.9 Exercises . 26 v Contents vi 3 Programming Environments 27 3.1 Editor . 28 3.2 Compiler . 28 3.3 Librarian . 30 3.4 Linker . 31 3.5 Loader . 32 3.6 Debugger . 32 3.7 Profiler . 33 3.8 Testing tools . 33 3.9 Configuration tools . 34 3.10 Interpreters . 34 3.11 The Java model . 35 3.12 Exercises . 37 II Essential Concepts 38 4 Elementary Data Types 39 4.1 Integer types .
    [Show full text]
  • CWI Scanprofile/PDF/300
    Centrum voor Wiskunde en lnformatica Centre for Mathematics and Computer Science L.G.L.T. Meertens, S. Pemberton An implementation of the B programming language Department of Computer Science Note CS-N8406 June Biblioiiie.:;I( ~'~'l'i't'Hm.n<' Wi~f.i;r;de- c11 !nform;:,;i:i.C<a - Ams1errJar11 AN IMPLEMENTATION OF THE B PROGRAMMING LANGUAGE L.G.L.T. MEERTENS, S. PEMBERTON CentPe foP Mathematics and ComputeP Science~ AmstePdam Bis a new programming language designed for personal computing. We describe some of the decisions taken in implementing the language, and the problems involved. Note: B is a working title until the language is finally frozen. Then it will acquire its definitive name. The language is entirely unrelated to the predecessor of C. A version of this paper will appear in the proceedings of the Washington USENIX Conference (January 1984). 1982 CR CATEGORIES: 69D44. KEY WORDS & PHRASES: programming language implementation, progrannning envi­ ronments, B. Note CS-N8406 Centre f~r Mathematics and Computer Science P.O. Box 4079, 1009 AB Amsterdam, The Netherlands I The programming language B B is a programming language being designed and implemented at the CWI. It was originally started in 1975 in an attempt to design a language for beginners as a suitable replacement for BASIC. While the emphasis of the project has in the intervening years shifted from "beginners" to "personal computing", the main design objectives have remained the same: · • simplicity; • suitability for conversational use; • availability of tools for structured programming. The design of the language has proceeded iteratively, and the language as it now stands is the third iteration of this process.
    [Show full text]
  • Safe Arrays Via Regions and Dependent Types
    Safe Arrays via Regions and Dependent Types Christian Grothoff1 Jens Palsberg1 Vijay Saraswat2 1 UCLA Computer Science Department University of California, Los Angeles [email protected], [email protected] 2 IBM T.J. Watson Research Center P.O. Box 704, Yorktown Heights, NY 10598, USA [email protected] Abstract. Arrays over regions of points were introduced in ZPL in the late 1990s and later adopted in Titanium and X10 as a means of simpli- fying the programming of high-performance software. A region is a set of points, rather than an interval or a product of intervals, and enables the programmer to write a loop that iterates over a region. While conve- nient, regions do not eliminate the risk of array bounds violations. Until now, language implementations have resorted to checking array accesses dynamically or to warning the programmer that bounds violations lead to undefined behavior. In this paper we show that a type system for a language with arrays over regions can guarantee that array bounds vi- olations cannot occur. We have developed a core language and a type system, proved type soundness, settled the complexity of the key deci- sion problems, implemented an X10 version which embodies the ideas of our core language, written a type checker for our X10 version, and ex- perimented with a variety of benchmark programs. Our type system uses dependent types and enables safety without dynamic bounds checks. 1 Introduction Type-safe languages allow programmers to be more productive by eliminating such difficult-to-find errors as memory corruption. However, type safety comes at a cost in runtime performance due to the need for dynamic safety checks.
    [Show full text]
  • Runtime Defenses 1 Baggy Bounds Checking
    Runtime Defenses Nicholas Carlini 7 September, 2011 1 Baggy Bounds Checking 1.1 Motivation There are large amounts of legacy C programs which can not be rewritten entirely. Since memory must be manually managed in C, and bounds are not checked automatically on pointer dereferences, C often contains many memory-saftey bugs. One common type of bug is buffer overflows, where the length of an array is not checked before copying data in to the array: data is written past the end of the array and on top of whatever happens to be at that memory location. 1.2 Baggy Bounds Checking Baggy bounds checking was proposed as a possible defense which would detect when pointers wrote to a memory location out of bounds. It is a runtime defense, which uses source code instrumentation. At a very high level, baggy bounds checking keeps track of bounds information on allocated objects. When a pointer which indexes in to allocated memory is dereferenced, bounds checking code ensures that the pointer still is indexing the same allocated memory location. If the pointer points out of bounds of that object, the program is halted. Baggy bounds checking uses a buddy allocator to return objects of size 2n, and fills the remaining unused bytes with 0s. A bounds table is created containing the log of the sizes of every allocated object. Using this table allows for single-memory-access lookups, which is much more efficient than other methods which require multiple memory accesses. 1.3 Baggy Bounds Checking Failures Baggy bounds checking does not work in all situations, however.
    [Show full text]
  • Spring 2015 CS 161 Computer Security Optional Notes Memory
    CS 161 Optional Notes Spring 2015 Computer Security 1 Memory safety | Attacks and Defenses In the first few lectures we will be looking at software security|problems associated with the software implementation. You may have a perfect design, a perfect specification, perfect algorithms, but still have implementation vulnerabilities. In fact, after configuration errors, implementation errors are probably the largest single class of security errors exploited in practice. We will start by looking at a particularly prevalent class of software flaws, those that concern memory safety. Memory safety refers to ensuring the integrity of a program's data structures: preventing attackers from reading or writing to memory locations other than those intended by the programmer. Because many security-critical applications have been written in C, and because C has peculiar pitfalls of its own, many of these examples will be C-specific. Implementation flaws can in fact occur at all levels: in improper use of the programming language, the libraries, the operating system, or in the application logic. We will look at some of these others later in the course. 1 Buffer overflow vulnerabilities We'll start with one of the most common types of error|buffer overflow (also called buffer overrun) vulnerabilities. Buffer overflow vulnerabilities are a particular risk in C. Since it is an especially widely used systems programming language, you might not be surprised to hear that buffer overflows are one of the most pervasive kind of implementation flaws around. As a low-level language, we can think of C as a portable assembly language. The programmer is exposed to the bare machine (which is one reason that C is such a popular systems language).
    [Show full text]
  • Implicit Array Bounds Checking on 64-Bit Architectures
    Implicit Array Bounds Checking on 64-bit Architectures CHRIS BENTLEY, SCOTT A. WATTERSON, DAVID K. LOWENTHAL, and BARRY ROUNTREE The University of Georgia Several programming languages guarantee that array subscripts are checked to ensure they are within the bounds of the array. While this guarantee improves the correctness and security of array- based code, it adds overhead to array references. This has been an obstacle to using higher-level languages, such as Java, for high-performance parallel computing, where the language specification requires that all array accesses must be checked to ensure they are within bounds. This is because, in practice, array-bounds checking in scientific applications may increase execution time by more than a factor of 2. Previous research has explored optimizations to statically eliminate bounds checks, but the dynamic nature of many scientific codes makes this difficult or impossible. Our ap- proach is, instead, to create a compiler and operating system infrastructure that does not generate explicit bounds checks. It instead places arrays inside of Index Confinement Regions (ICRs), which are large, isolated, mostly unmapped virtual memory regions. Any array reference outside of its bounds will cause a protection violation; this provides implicit bounds checking. Our results show that when applying this infrastructure to high-performance computing programs written in Java, the overhead of bounds checking relative to a program with no bounds checks is reduced from an average of 63% to an average of 9%. Categories and Subject Descriptors: D.3.4 [Programming Languages]: Processors— Optimization General Terms: Measurement, Performance Additional Key Words and Phrases: Array-bounds checking, virtual memory, 64-bit architectures 1.
    [Show full text]
  • Ocaml Inside: a Drop-In Replacement for Libtls
    OCaml inside: a drop-in replacement for libtls Enguerrand Decorne (speaker), Jeremy Yallop, David Kaloper-Meršinjak University of Cambridge Computer Laboratory Introduction: openssl to libtls to libnqsb-tls case, these are declarations of the functions in the libtls inter- The C programming language pervades systems software. An op- face, which our library exports. The source files contain definitions erating system in the Unix tradition consists of a kernel, written of those functions, which mediate between the libtls interface in C, and a collection of libraries and executables, also written in and the ocaml-tls implementation, and which can be compiled C, which communicate in large part via APIs defined as C types and linked together with ocaml-tls, the OCaml runtime, and the and functions. Systems built in C typically suffer from a number of OCaml code that implements libnqsb-tls to build a shared li- problems, ranging from buffer overflows and other violations that brary. follow inevitably from unrestricted access to memory, to awkward let tls_server ()= APIs that result from an inexpressive type system and a lack of let tls_server= automatic memory management. { error= None; config= None; fd= None; The openssl library, which implements the cryptographic pro- state=‘NotConfigured; linger= None} in Root.create tls_server |> from_voidp tls tocols TLS and SSL, suffers from both these problems. The lack of bounds checking in C led to the notorious Heartbleed bug in let()=I.internal "tls_server" 2014; a study two years earlier found that almost no clients of (void @-> returning(ptr tls)) tls_server openssl use the library correctly, apparently due to its unhelpful interface (Georgiev et al.
    [Show full text]