David R. Hanson — «C Interfaces and Implementations

1 INTRODUCTION big program is made up of many small modules. These modules provide the functions, procedures, and data structures used in A the program. Ideally, most of these modules are ready-made and come from libraries; only those that are specific to the application at hand need to be written from scratch. Assuming that library code has been tested thoroughly, only the application-specific code will contain bugs, and debugging can be confined to just that code. Unfortunately, this theoretical ideal rarely occurs in practice. Most programs are written from scratch, and they use libraries only for the lowest level facilities, such as I/O and memory management. Program- mers often write application-specific code for even these kinds of low- level components; it’s common, for example, to find applications in which the C library functions malloc and free have been replaced by custom memory-management functions. There are undoubtedly many reasons for this situation; one of them is that widely available libraries of robust, well designed modules are rare. Some of the libraries that are available are mediocre and lack standards. The C library has been standardized since 1989, and is only now appear- ing on most platforms. Another reason is size: Some libraries are so big that mastering them is a major undertaking. If this effort even appears to be close to the effort required to write the application, programmers may simply reim- plement the parts of the library they need. User-interface libraries, which have proliferated recently, often exhibit this problem. Library design and implementation are difficult. Designers must tread carefully between generality, simplicity, and efficiency. If the routines and data structures in a library are too general, they may be too hard to 1 2 INTRODUCTION use or inefficient for their intended purposes. If they’re too simple, they run the risk of not satisfying the demands of applications that might use them. If they’re too confusing, programmers won’t use them. The C library itself provides a few examples; its realloc function, for instance, is a marvel of confusion. Library implementors face similar hurdles. Even if the design is done well, a poor implementation will scare off users. If an implementation is too slow or too big — or just perceived to be so — programmers will design their own replacements. Worst of all, if an implementation has bugs, it shatters the ideal outlined above and renders the library useless. This book describes the design and implementation of a library that is suitable for a wide range of applications written in the C programming language. The library exports a set of modules that provide functions and data structures for “programming-in-the-small.” These modules are suitable for use as “piece parts” in applications or application components that are a few thousand lines long. Most of the facilities described in the subsequent chapters are those covered in undergraduate courses on data structures and algorithms. But here, more attention is paid to how they are packaged and to making them robust. Each module is presented as an interface and its implementation. This design methodology, explained in Chapter 2, separates module specifications from their implementations, promotes clarity and precision in those specifications, and helps provide robust implementations. 1.1 Literate Programs This book describes modules not by prescription, but by example. Each chapter describes one or two interfaces and their implementations in full. These descriptions are presented as literate programs. The code for an interface and its implementation is intertwined with prose that explains it. More important, each chapter is the source code for the interfaces and implementations it describes. The code is extracted automati- cally from the source text for this book; what you see is what you get. A literate program is composed of English prose and labeled chunks of program code. For example, 〈compute x • y〉≡ sum = 0; LITERATE PROGRAMS 3 for (i = 0; i < n; i++) sum += x[i]*y[i]; defines a chunk named 〈compute x • y〉; its code computes the dot product of the arrays x and y. This chunk is used by referring to it in another chunk: 〈function dotproduct〉≡ int dotProduct(int x[], int y[], int n) { int i, sum; 〈compute x • y〉 return sum; } When the chunk 〈function dotproduct〉 is extracted from the file that holds this chapter, its code is copied verbatim, uses of chunks are replaced by their code, and so on. The result of extracting 〈function dotproduct〉 is a file that holds just the code: int dotProduct(int x[], int y[], int n) { int i, sum; sum = 0; for (i = 0; i < n; i++) sum += x[i]*y[i]; return sum; } A literate program can be presented in small pieces and documented thoroughly. English prose subsumes traditional program comments, and isn’t limited by the comment conventions of the programming language. The chunk facility frees literate programs from the ordering con- straints imposed by programming languages. The code can be revealed in whatever order is best for understanding it, not in the order dictated by rules that insist, for example, that definitions of program entities pre- cede their uses. The literate-programming system used in this book has a few more features that help describe programs piecemeal. To illustrate these features and to provide a complete example of a literate C program, the rest 4 INTRODUCTION of this section describes double, a program that detects adjacent identi- cal words in its input, such as “the the.” For example, the UNIX command % double intro.txt inter.txt intro.txt:10: the inter.txt:110: interface inter.txt:410: type inter.txt:611: if shows that “the” occurs twice in the file intro.txt; the second occur- rence appears on line 10; and double occurrences of “interface,” “type,” and “if” appear in inter.txt at the lines shown. If double is invoked with no arguments, it reads its standard input and omits the file names from its output. For example: % cat intro.txt inter.txt | double 10: the 143: interface 343: type 544: if In these and other displays, commands typed by the user are shown in a slanted typewriter font, and the output is shown in a regular typewriter font. Let’s start double by defining a root chunk that uses other chunks for each of the program’s components: 〈double.c 4〉≡ 〈includes 5〉〈data 6〉〈prototypes 6〉〈functions 5〉 By convention, the root chunk is labeled with the program’s file name; extracting the chunk 〈double.c 4〉 extracts the program. The other chunks are labeled with double’s top-level components. These components are listed in the order dictated by the C programming language, but they can be presented in any order. The 4 in 〈double.c 4〉 is the page number on which the definition of the chunk begins. The numbers in the chunks used in 〈double.c 4〉 are the LITERATE PROGRAMS 5 page numbers on which their definitions begin. These page numbers help readers navigate the code. The main function handles double’s arguments. It opens each file and calls doubleword to scan the file: 〈functions 5〉≡ int main(int argc, char *argv[]) { int i; for (i = 1; i < argc; i++) { FILE *fp = fopen(argv[i], "r"); if (fp == NULL) { fprintf(stderr, "%s: can't open '%s' (%s)\n", argv[0], argv[i], strerror(errno)); return EXIT_FAILURE; } else { doubleword(argv[i], fp); fclose(fp); } } if (argc == 1) doubleword(NULL, stdin); return EXIT_SUCCESS; } 〈includes 5〉≡ #include <stdio.h> #include <stdlib.h> #include <errno.h> The function doubleword needs to read words from a file. For the purposes of this program, a word is one or more nonspace characters, and case doesn’t matter. getword reads the next word from an opened file into buf[0..size-1] and returns one; it returns zero when it reaches the end of file. 〈functions 5〉+≡ int getword(FILE *fp, char *buf, int size) { int c; c = getc(fp); 〈scan forward to a nonspace character or EOF 6〉 6 INTRODUCTION 〈copy the word into buf[0..size-1] 7〉 if (c != EOF) ungetc(c, fp); return 〈found a word? 7〉; } 〈prototypes 6〉≡ int getword(FILE *, char *, int); This chunk illustrates another literate programming feature: The +≡ that follows the chunk labeled 〈functions 5〉 indicates that the code for getword is appended to the code for the chunk 〈functions 5〉, so that chunk now holds the code for main and for getcode. This feature permits the code in a chunk to be doled out a little at a time. The page number in the label for a continued chunk refers to the first definition for the chunk, so it’s easy to find the beginning of a chunk’s definition. Since getword follows main, the call to getword in main needs a pro- totype, which is the purpose of the 〈prototypes 6〉 chunk. This chunk is something of a concession to C’s declaration-before-use rule, but if it is defined consistently and appears before 〈functions 5〉 in the root chunk, then functions can be presented in any order. In addition to plucking the next word from the input, getword incre- ments linenum whenever it runs across a new-line character. doubleword uses linenum when it emits its output. 〈data 6〉≡ int linenum; 〈scan forward to a nonspace character or EOF 6〉≡ for ( ; c != EOF && isspace(c); c = getc(fp)) if (c == '\n') linenum++; 〈includes 5〉+≡ #include <ctype.h> The definition of linenum exemplifies chunks that are presented in an order different from what is required by C.

Load more