Mgradman1.Pdf
Total Page:16
File Type:pdf, Size:1020Kb
Gradman - 1 Literate Programming by Mike Gradman Introduction Computer programming has often seen its share of code that is hard to understand and read due to poor documentation and style. Unfortunately, this not so straightforward code is found in most programs. Frequently, programmers write code that only they can understand, not considering others who may work with the code or read through it. Even if the programmer tries to clearly explain what he is doing through writing comments in the code or uses good style, others may not be able to make sense out of the code. Nasty bugs might pop up that would be hard to fix or a developer might need to know what is going on inside a particular piece of code. He might have to nag the original programmer to clarify points about the program if he is even reachable at all. Also, in the software development process, either management, the designers, or programmers have to keep up with many different kinds of documents, or artifacts throughout the lifecycle. For example, something in the requirements for the application might either be misunderstood or totally overlooked in the code as the developers have to refer to a separate document to see what needs to be implemented. Similar inconsistencies can occur due to the myriad of documents that are produced in the software life cycle. Most of these problems occur due to a lack of proper understanding of the application being developed or used or the code written for that application. A paradigm called “literate programming” helps developers, teachers, and students in the process of solving various problems and understanding their solutions, especially if they have to build on or otherwise comprehend other people’s work. Programming Languages: Conventional Attempts at Understanding Programs Before Programming Languages In the early years of the Computer Age, programmers had to use machine language to instruct the computers what they wanted them to do, which involved commands such as moving values into registers, manipulating memory, or handling control flow in a complicated manner. At that time, there was no essence of entities or other abstractions that could give the layperson or even expert programmers a true understanding of the problem at hand. FORTRAN Then, in the 1950’s, the first attempts to develop high-level programming languages were made with the aim to allow the programmer to think more in terms of what the program needed to do and how to do it. One of these first languages was FORTRAN, with abstractions such as data types and control structures in the form of simple English language (such as the DO-loop and the IF-THEN statement) which represented common idioms that the programmer could grasp [Mac87]1. For example, if 1 Using a documentation style used in many IEEE journals … take first three letters of last name of first author listed concatenated with the year of publication … article by Edward Jones and Robert Smith from 1998 would be listed as [Jon98]. References list uses this shorthand followed by the citation in MLA format. Gradman - 2 the programmer wanted to double the value of each element of a ten-element array A, he would write: DO 100 I=1, 10 A(I) = A(I) * 2 100 CONTINUE Also, the IF-statement could clearly express limited forms of control such as taking the absolute value of a variable named VALUE: IF (VALUE.LT.0) VALUE = -VALUE Structures like these allowed the developer to state what he wanted to do in a clear and concise fashion by 1950’s standards. Also, the developer could break up his work and design pieces of the application at a time by using subroutines [Mac87] and either try to understand each small part of the program and how it fits in the whole (bottom-up design) or get the general idea of what the program needs to do and then handle more specific matters as he develops the application (top-down design). However, the control structures in FORTRAN were primitive and produced code that was hard to follow (thanks to GOTO’s) along with and difficult to comprehend [Mac87]. COBOL Another attempt in the late 1950’s was COBOL. This language’s main goal was to produce code that resembled the English language in a natural-seeming form. COBOL was meant to be a self-documenting language, code that made it clear on its own what was being done in the program and thus easy to understand. However, the syntax was very verbose which made it hard to program in the first place, especially due to certain divisions of the code. An example of the verbosity of COBOL’s “natural language” is a program that reads customer records and computes average sales, that is sales per call [Bel89]. Several pieces stand out. First, about ten lines of code spell out how the input file is formatted. The file itself is a simple chart that has the customer information and total sales and number of sales calls. More heinous though is the output file, which is another simple chart. This chart summarizes the customer information and has the average sales per call listed. It looks something like: CUSTOMER LIST CUSTOMER CUST. NAME ADDRESS CITY STATE ZIP AVG. SALES/CALL 00001 John Doe 1 Here St. Houston TX 77333 0931.11 00002 Mary Suarez 2 2nd Ave. New York NY 01931 4943.32 However, even this easy output chart takes over thirty lines of code to describe. Here is just a small snippet from the code that describes the format of the output chart: Gradman - 3 01 DETAIL-LINE. 05 FILLER PIC X(2) VALUE SPACES. 05 CUSTOMER-NUMBER PIC 9(5) VALUE ZEROS. 05 FILLER PIC X(3) VALUESPACES. 05 CUSTOMER-NAME PIC X(30) VALUE ZEROS. [some more code … snipped for space] 05 CUSTOMER-SALES PIC 9999.99 VALUE ZEROS. 05 FILLER PIC X(11) VALUE SPACES. [Bel89]. This code describes the formatting for each row of the table and must be specified correctly in order for the chart to properly appear. Also from this same example, the code to populate a line of the table looks like: FORMAT-DETAIL-LINE-MODULE MOVE I-CUSTOMER-NUMBER TO CUSTOMER-NUMBER. MOVE I-CUSTOMER-NAME TO CUSTOMER-NAME. [same for rest of fields of table] DIVIDE I-TOTAL-SALES BY I-SALES-CALLS GIVING AVERAGE-SALES. MOVE AVERAGE-SALES TO CUSTOMER-SALES. MOVE DETAIL-LINE TO 0-PRINTER-RECORD. MOVE 1 TO LINE-SPACING. PREFORM PRINTING-MODULE. [Bel89]. Again, the syntax is very verbose and must be written exactly for the program to work. In other languages, such as C or Pascal, this routine would be much clearer by the use of mathematical notation and assignment statements (In C, the division would be done by a statement like “average_sales = i_total_sales / i_sales_calls”). From the customer sales example, one can see that COBOL’s “natural language” was really hard to understand and use instead of the program being self-documenting and easy to comprehend. Thus, even though the ideals established for COBOL were in the direction of making code easier to understand, the realization of that goal fell well short of its expectations. Procedural Programming Over the next twenty to thirty years, other languages appeared which made code easier to understand and work with. Algol provided more intuitive control structures such as the FOR and WHILE loops and the IF-ELSE statement, eliminating the need for the GOTO’s which would have made code harder to follow. Pascal and Ada added Gradman - 4 facilities to declare abstract data types (ADT’s) [Mac87], adding yet another way to make applications easier to design and understand. For example, in Pascal a programmer can declare new data types using the “type” keyword: {Example taken from [Mac87] and modified with comments} type { a person type } person = record name: string; age: 16..100; {age can be from 16 to 100} salary: 10000..100000; {salary in dollars} sex: (male, female); {an employee is either male or female} birthdate: date; {date is also an ADT} hiredate: date; end; string = packed array [1..30] of char; { a date type } date = record mon: month; {month is an ADT} day: 1..31; year: 1900..2000; end; {months of the year} month = (Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec); The programmer can declare variables of these types and give them values they understand like this: var newhire: person; {newhire is a person} begin {new person is a female of age 25 and was hired on June 1, 2000} {which is easy to see from the code below} newhire.age = 25; newhire.sex = female; newhire.hiredate.mon = Jun; newhire.hiredate.day = 1; newhire.hiredate.day = 2000; end. Gradman - 5 These languages are classified as “procedural”, which encourages the use of procedures and functions to implement common tasks which must be done repeatedly in the code. Procedural programming also encourages both top-down and bottom-up design. Like with COBOL, the aim was clear, self-documenting code. However, with more complex problems, the ability to comprehend the code declined sharply. Numerous procedure or function calls could make the code resemble FORTRAN to an extent as control would jump all over the place: {this is “pseudocode”, code that doesn’t necessary follow the syntax of any } {programming language, but is used to convey ideas to the reader } program jumps; {A, B, C, and D are procedures} procedure A; begin if (some condition is true) then C; {procedure C is called} else if (something else is true) then B; else if (yet another thing is true) then D; else writeln(“We’re done!”); end; procedure B; begin if (proposition is true) then D; else C; end; procedure C; begin if (case is true) then set values such no conditions in procedure A will be satisfied; A; else D; end; Gradman - 6 procedure D; begin if (some situation did occur) then do some processing; B; {things may have changed … so let’s call B again} end; { main program } begin A; end.