<<

CS 61C vs. CS 61AB

The CS 61 series is an introduction to computer science, with particular emphasis on software and on machines from a programmer's point of view. The first two courses considered programming a high level of abstraction, introducing a range of programming paradigms and common techniques. This course, the last in the series, concentrates on machines and how they carry out the programs you .

The main topics of CS 61CL involve the low-level system software and the hardware organization of a "logical machine"—not the actual electronic circuits, but the computational operations that those circuits carry out. To these ideas concrete, you will study the structure of a particular computer, the MIPS R3000 processor, in some detail, down to the level of the design of the processor's on-chip components.

Course outline

Topics in the first half of CS 61CL will be covered roughly in the sequence specified below. week topics 1 introduction to C 2 C arrays and pointers 3 dynamic storage allocation 4 introduction to assembly language 5 assembly language translation of C operations 6 machine language 7 floating-point representations

Topics to be covered in the remainder of the course include the following: input and output linking and loading logic design CPU design C memory management virtual memory caches

A notebook for CS 61CL

We encourage you to buy a notebook specifically for CS 61CL and bring it to lab every day. T.a.s will want a place to write answers for questions you ask. You will want a place to keep track of issues or techniques that arise in lab. The notebook is also a good place to keep track of errors you make, to you not make them again.

Overview

For those of you that are new to the UC-WISE learning environment, this (http://inst.eecs.berkeley.edu/ ~cs61cl/fa07/misc/intro.to.UC-WISE.html) provides an overview.

Why C? C arose from work on at in the late 1960’s. It filled two main needs. First, it is a relatively high-level language that in addition allows flexible low-level access to memory and resources. It's also relatively easy to write a compiler for, so it wasn't too long before implementations of C were available on a variety of computers. (This contributed greatly to the spread of UNIX.)

In CS 61CL, we'll be focusing on two aspects of C. First, there are things you need to be an everyday C programmer; you'll write some of the CS 61CL projects in C, so we hope these hints will help you with those programs. Also, C allows easy access to other aspects of the CS 61CL content.

C vs. Java

To a first approximation, C is just Java without the object-oriented features. Here's a Java program that veterans of CS 61B have probably encountered. public class Account { public Account (int amount) {

myAmount = amount;

}

public void deposit (int amount) {

myAmount = myAmount – amount;

}

public int balance ( ) {

return myAmount;

}

private int myAmount;

}

Crossing out mentions of class and public/private leaves an almost legal C program:

Format of a C program

Here's an excerpt of a program we'll see later in this lab session, along with some explanatory notes. #include

int main ( ) { unsigned int exp = 1; int k; /* Compute 2 to the 31st. */ for (k=0; k<31; k++) { exp = exp * 2; } ... return 0; }

Notes:

#include is loosely similar to the import construct in Java. Mechanically, the C compiler replaces the #include line by the contents of the named . However, well-designed include files are an important aspect of good C programming style. Important system programming interfaces are described in < > include files. (Surrounding the file name with angle brackets, as in the example, tells the compiler to look for the file in a set of system directories.) User-defined data types and interfaces, such as you will write, are described in " " include files. (Surrounding the file name with double quote marks says to look in the user's directory for the file.)

As in Java, main is called by the operating system to run the program. Unlike in Java, main is an int- valued function that returns a success-failure code (0 means success, anything else means failure). Command-line arguments may but are not required to be specified as arguments to main. The C compiler also allows main to be declared as void; K&R contains numerous examples of this.

C requires declaration before use; moreover, all declarations must appear after the left brace surrounding a group of statements.

C uses "/*" and "*/" as comment delimiters as does Java. recent versions of C also allow "//" comments as in Java.

Another example

Here's another example that we'll revisit later in this lab session. #include

int bitCount (unsigned int n);

int main ( ) { ("%d %d", 0, bitCount (0)); printf ("%d %d", 1, bitCount (1)); printf ("%d %d", 27, bitCount (27)); return 0; }

int bitCount (unsigned int n) { int k; return 13; }

Notes: Functions should also be declared before they're used. In C, that's done with a function prototype; the line right after the #include is an example. Files whose names end in ".h" (for "header") typically consist of function prototypes and data definitions. Files whose names end in ".c" contain the code for those functions.

The printf function is used to produce output. Its first argument is a format string. Each occurrence of "%" in the format string indicates where one of the subsequent arguments will appear and in what form. This association between form and variable type isn't checked by the C compiler, as we'll note later.

Some troublesome details

C doesn't have a boolean type. It uses int instead: 0 means "false", anything else means "true". (This is similar to Scheme.) This feature, combined with assignment within expressions and a looser approach to types in general, leads to a common mistake made by beginning C programmers: using "=" instead of "==" in a comparison causes C to interpret the comparison as an assignment statement, whose value is the value assigned. Thus, a typical assignment statement might be: n = 5; and a typical conditional might be: if (n == 5) { ... do something ...}

But the following is a funny combination of both: if (n = 5) { .. do something ...}

It assigns the value 5 to the variable n and tests whether 5 is not equal to zero. (It's true. 5 is not 0.) The always succeeds, probably not what the programmer intended. In Java, this causes an error, since the condition in an if statement must be of type boolean. And , C programmers do elegant uses of the assign-and-test idiom, but in moderation. For new C programmers, it is usually a bug.

C doesn't check for the failure to initialize variables, or for accessing outside the bounds of an array. These omissions have been the bane of many a C programmer over the years. They are best addressed by good, disciplined programming conventions right from the beginning. Although simple, C is very powerful and allows you to get at anything the machine can do. Without care, that power can be dangerous.

Review of some UNIX commands

First, use the command to create a directory named lab1 in your home directory. The mkdir command takes a single argument, the name of the directory to create.

Then use the command to copy two files from the CS 61CL code directory to the lab1 directory. The cp command takes two arguments: the file to be copied, and the destination file or directory.

The files to be copied are ~cs61cl/code/wc1.c and ~cs61cl/code/.data.txt.

Incidentally, if you're new to UNIX, this web page contains short descriptions of some more useful UNIX commands.

The important is man. If you are not sure how a UNIX command works, run man on it. (Yes, man man works too.)

UNIX i/o redirection

The default input source for many UNIX programs is the keyboard. However, this source may be redirected to come from a file, using the < operator. For example, a desk calculator program named simulates a calculator that takes input in postfix order (operands before operator). When one runs dc with the command dc input comes from the keyboard. For example, to compute 2 + 3, one would type 2 3 + p

("p" prints the result.) To take input instead from a file named dc.cmds, we would type dc < dc.cmds gcc, gdb, and emacs

In lab, we'll be using the gcc C compiler and the gdb debugger. The emacs editor provides features that enhance the capability of gdb, with you'll shortly get some practice.

Check right now that the gcc command is set up properly for your account, by typing the command to the UNIX shell. Among the entries printed should be one for gcc that includes the options "-", "-g", and "-std=c99". "Wall" means "Warn about all recognized irregularities"; "-g" causes gcc to store enough information in the executable program for gdb to make sense of it; "-std=c99" enables the use of features added to C in the 1999 standard. Warn your instructor if these options do not appear in an alias entry for gcc.

Let's try it out. Type gcc. It should come back with gcc: no input files. That's right; you didn't specify any. If it comes back with command not found, you have a problem to out.

Run it on the input file that you just copied over: gcc wc1.c

Where did the output go? It should have created an executable file—your program. What UNIX command do you use to find this file? How do you run it? What does it do? Ask a labmate if you're not sure of the answers to these questions. gdb is another executable. You can run your program under gdb by typing gdb nameOfAnExecutableFile.

To write your own programs, you'll need an editor. If you're unfamiliar with emacs, or you've forgotten what you learned about it in past CS classes, there should be a "quick reference" to emacs document somewhere near your workstation. This document is also available online here. Don't hurry through it. Knowing the tools well will save you lots of later.

All three of these tools come together to make an integrated programming environment. Another document that's available in lab is "Introduction to Using GDB Under Emacs". Take a few minutes to read this document.

A buggy program

The program below, which you should have copied to your lab1 directory, is intended to mimic the behavior of the wc (Word Count) program supplied as part of the standard UNIX distribution. Giving the command wc - < fileName prints the number of words—essentially sequences of non-space characters—in the named file. For example, if a file named data contains the two lines a bc def ghi j then the command wc -w < data should produce 5 as output. (Type man wc for more information.)

Here's the buggy program. #include #include int main ( ) { int wc; char c; for (wc=1; ; wc++) { /* We're about to encounter the wc'th word. Skip leading white space. */ while ((c = getchar ( )) && c != EOF && isspace (c)) { } /* Read through characters of the word. */ while ((c = getchar ( )) && c != EOF && !isspace (c)) { } } printf ("%d\n", wc); return 0; }

A few notes:

This program processes character data rather than integers.

The getchar function returns the next character from the input source. If no more characters remain, getchar returns the special character EOF. The isspace function is part of the ctype (short for "character type") library and is accessed via the line #include near the of the program.

We hope that nothing else in the program is mysterious. If you are unsure of some aspect of the program, ask a labmate.

Using the gdb debugger

First, run the wc1 program on the sample data file: gcc -Wall wc1.c a.out < wc.data.txt

The program has an infinite loop Type control-C to regain control. To help find out what's wrong with the program, do the following:

Compile the program and run it with gdb as described in the "An Introduction to Using GDB Under Emacs" document. This will involve splitting your emacs window in half, then running gdb in the bottom half. Here are the details of doing this. Edit the wc1.c file with emacs. the window using control-x 2. In the code window, type meta-x compile; backspace over the make command, and replace it with gcc - Wall -g wc1.c. In the other window, where you want to run gdb, type meta-x gdb, then complete the command by typing a.out. ("a.out" is the default name of the executable file produced by gcc.) Move the cursor to the gdb window, click on the command prompt, and type run < wc.data.txt. Observe the program's behavior. Type control-c to regain control. Go back to the code window, and continue with step #2 below.

Click on the line after the for loop. Set a breakpoint at that line using control-x space, then run the program again in gdb. Type the continue command (you can use "c" to abbreviate it) four times. Then use the next command to single-step through the program to determine what's going wrong. Identify and fix the error.

Another way to set a breakpoint is to give the "b" command at the "(gdb)" prompt in one of the following two ways. (Use the "list" command to determine line numbers.) b function_name b line_number

Recompile and run the correct program.

In the next step, explain the bug you found and how you fixed it.

Representation of integers

We will now examine standard representations of nonnegative integers, to make the concept of a value "fitting in" a memory location. First, we distinguish between a value and the representation of that value, i.e. how it is communicated from one person to another. The integer value 27 may be represented using decimal digits, or Roman numerals (XXVII), or words ("twenty-seven"), or an arithmetic expression (6*3+9).

We are accustomed to using the decimal system for representing numbers. In this system, we represent a value with a sequence of decimal digits, each being one of 0, ..., 9. The position of a digit in this sequence is important: 758 represents a different value from 857. In fact, each position is associated with a weight. The weights and the digits are combined arithmetically to form the represented value. Finally, it's a based number system, in that each weighting is a power of some base value (also referred to as the radix). In decimal, the base is 10. A decimal representation of an integer M is then a sequence of decimal digits dn-1 dn-2 ... d1 d0 such that M = dn-1 × 10n-1 + dn-2 × 10n-2 + ... + d1 × 101 + d0 × 100

For example, 758 represents 7 × 102 + 5 × 101 + 8 × 100.

Inside a computer, it is more convenient to use a base that's a power of 2, for example, binary base 2 "digits" are 0 and 1 octal base 8 "digits" are 0, ..., 7 base 16 "digits" are 0, ..., 9, A (representing 10), B (representing 11), C, D, E, F (representing 15)

For example, the value that 45 represents in base 10 is represented in binary as 101101, in octal as 55, and in hexadecimal as 2D, since 45 base 10 = 1 × 25 + 0 × 24 + 1 × 23 + 1 × 22 + 0 × 21 + 1 × 20 = 5 × 81 + 5 × 80 = 2 × 161 + 13 × 160

Conversion between bases that are powers of 2

Any representation using a base that's a power of 2 can easily be converted to binary, by writing out the individual digits in binary. For example: 2B3 base 16 = 0010 1011 0011 base 2

To convert from binary to a base that's 2a, just write the digits in groups of a and translate each group to its corresponding digit base 2a. For example, we can translate from binary to octal, base 23, as follows: 001010110011 base 2 = 001 010 110 011 base 2 = 1263 base 8

Why powers of 2 are interesting

For reasons that you would learn about in an electronics course (e.g. EE 40 or EE 42), computer memories are organized into collections of circuits that are either "on" or "off". For storage of numeric information, this naturally suggests the use of base 2, with the binary digit (bit) 1 representing "on" and 0 representing "off".

Typical units of computer memory are a byte (8 bits; also used to represent a char in C), a word (32 bits; used to represent an int in C), a halfword (16 bits; used to represent a short int in C or a Unicode character), and a double word (64 bits). With n bits, you can represent 2n different values. Thus with 8 bits, you can represent 256 = 28 different values; 10 bits lets you represent 1024 = 1 "K" = 210 different values.

An int in C can take on only a limited number of different values. When we try to construct an integer value that's outside that range, and store it in an int, surprises may result.

Experimenting with conversion between bases

Given below is a program that, given a binary representation typed by the user, prints the decimal equivalent. It's in the file ~cs61cl/code/bin2dec.c. Copy the program to your lab1 directory.

The program is missing one statement. Supply it, then test the program. It may help to work through how you would produce its output by hand, if given the digits of a binary representation left to right. #include int main ( ) { int n; char c; while (1) { /* Translate a new line to decimal. Assume that the only characters on the line are zeroes and ones. */ n = 0; c = getchar ( ); if (c == EOF) { break; } while (c == '1' || c == '0') { /* The missing statement goes here. */ c = getchar ( ); } printf ("in decimal is %d\n", n); } return 0; }

More debugging

The program ~/code/checkPower2.c declares and initializes a variable n, then checks whether its value is a power of 2. Copy the program to your lab1 directory and run it to verify that 32768 is a power of 2 (it's 215). Then test the program on several other values:

65536 (a power of 2)

1 (another power of 2)

32767 (not a power of 2) 3 (also not a power of 2)

Finally, test the program on the value 2147483649, which is 1 plus 231, the largest power of two that fits in an integer. Single-step through the computation to find out what happens. In the next step, explain what happened and why.

Printing an int in base 2

The program below, also in ~cs61cl/code/buggy.base2.print.c, is intended to print the base 2 representation of the unsigned value stored in the variable numToPrintInBase2. It has bugs. Fix them. #include int main ( ) { unsigned int numToPrintInBase2 = 2863311530; /* alternating 1's and 0's */ unsigned int exp = 1; int k; /* Compute the highest storable power of 2 (2 to the 31th). */ for (k=0; k<31; k++) { exp = exp * 2; } /* For each power of 2 from the highest to the lowest, print 1 if it occurs in the number, 0 otherwise. */ for (k=31; !(k=0); k--) { if (numToPrintInBase2 >= exp) { putchar ('1'); numToPrintInBase2 = numToPrintInBase2 - exp; } else { putchar ('0'); } exp = exp / 2; } putchar ('\n'); return 0; }

Printing with gdb

Of course, you don't need to write your own programs to print the contents of memory. For example, gdb includes a general printing facility. The command print /representation variable prints the value of the given variable, using the given representation. Among the choices for representation are the following: u unsigned decimal integer x hexadecimal o octal t binary

For example, print /x n would print the contents of n in hexadecimal.

Experiment with this gdb feature, using the simple program below. Set a breakpoint at the return statement, then change values via the print feature you used in the last lab, for example: print n = 25022

Simple program: int main ( ) { int n = 70; return 0; }

The char data type

C's char data type represents ASCII characters. (This table (http://wise- dev.berkeley.edu/student/externalLink.php? wise_moduleName=displayPage&wise_contentID=84950&wise_projectID=34401&wise_url=http://w ww.asciitable.com:=) lists them all.)

A char is treated by C as just a small integer. One may do arithmetic on char values. When an int is used where a char is expected, the excess leftmost bits are discarded.

Use the online ASCII table to answer the questions in the next step.

Using the od program

The od program interprets its input according to the command line arguments, and prints its interpretation. Recall the file wc.data.txt. Here are the results of running od on that file. od command od output notes od -b < wc.data.txt 0000000 141 040 142 143 040 144 145 146 012 040 040 147 150 151 040 152 0000020 040 012 0000022 prints each byte in octal (base 8). od -c < wc.data.txt 0000000 a b c d e f \n g h i j 0000020 \n 0000022 prints each byte as an ASCII character. od -d < wc.data.txt 0000000 24864 25187 08292 25958 02592 08295 26729 08298 0000020 08202 0000022 prints each pair of bytes in decimal. od -x < wc.data.txt 0000000 6120 6263 2064 6566 0a20 2067 6869 206a 0000020 200a 0000022 prints each pair of bytes in hexadecimal (base 16).

The seven-digit hexadecimal value at the start of each line gives the position of the byte in the file. man od will give you more information about how the program works.

The program ~cs61cl/code/mysteryout apparently produces a blank line as output. Find out what it really prints using od. Use the UNIX "pipe" facility to send the output of mysteryout to od: mysteryout | od options

A powerful idea for CS 61CL

In Java, data of different types are strictly separated. In general, one can use only integer operations on integers, only string operations on , and so forth. As we descend through layers of abstraction, however, we lose the notion of types. Data in a low-level computing environment are just collections of bits. The meaning of those bits can't be determined just by looking at them, for the same bits can represent a variety of objects: numbers, characters, even instructions!

The identical treatment of instructions and data—the stored program concept—was a big breakthrough in the early of computing. The first computers literally had "hard-wired" instructions; to change the program that the computer was running, one had to rewire the computer! The treatment of instructions as data, and the implementation of a general interpreter for those instructions, enabled programmers to climb to higher levels of programming abstraction.