Top-down Perspective: The Computer Organization and Design Underneath the Execution of Programming Language Mingkai Li1 1University of Science and Technology of China

developed initially for the designing of compilers and Abstract operating systems, thus allowing the C programmer to In Yale Patt’s book Introduction to computing systems, manipulate data items at a relatively very low level. To from bits and gates to C and beyond, the intricacies of the better elaborate the computer’s actions when executing the magnificent computing world reveal themselves as a huge, high-level programming language, we create an example systematically interconnected collection of some very C program. Although simple, the program shows some of simple parts. Although implementations of many modern the most important features of the language, allowing us to architectures vary greatly to gain shorter response time or discuss further about the implementation method of things greater throughput (bandwidth as sometimes called), the like preprocessing, linking, subroutine, control instruction, underneath computer organization and design is no more data movement instruction, memory-mapped IO, so on and than hardware and software consisting of hierarchical so forth. Thus, help the readers quickly grasp a rough layers using abstraction, with each lower layer hiding recognition about all those lower layers of abstractions details from the level above. The C programming language behind the high-level programming language through this provides a machine-independent interface with the article. underlying ISA and hardware, tremendously enhancing the At the very beginning, we need to establish an program’s expressiveness and readability. Different from overview about the hierarchy or the layers of abstractions the bottom-up approach adopted in Yale Patt’s book, we about the whole computing system. As shown in Figure 2, take a top-down perspective to uncover the details the instruction set architecture (or ISA as abbreviation) underneath the execution of C programming language plays a vital role as the communication between the step-by-step. To better elaborate the execution of the hardware and the low-level system software. Computer program on a specific implementation and avoid Origination and Design defines the ISA as “anything the unnecessary complexities of the modern-time architectures, programmer need to know to make a binary machine work we choose an education-oriented implementation called Little Computer 3 (or LC3 for short) introduced by University of Texas, Austin. In this article, we’ll briefly elaborate some important ideas and protocols in the domain from interpreter/compiler to the very fundamental digital logic devices. We’ll observe the process as the program written in C programming language being translated, assembled and finally being executed by the computer instruction by instruction, clock cycle by clock cycle over the data path. And see how the combinations of the simplest CMOS circuits have shaped today’s fast- changing Information Technology industry.

1 Introduction The C programming language was developed in 1972 by Figure 1: An example program calculating the absolute Dennis Ritchie at Bell Laboratory. The language was value written in C

1 machines may be distinctly different. To execute such a machine-independent programming language, it needs to be translated into a specific machine-dependent with the help of the low-level system software such as compiler or interpreter. Compared with high-level programming languages, assembly language is relatively much more machine-friendly. It is nothing more than some kind of more human-readable mnemonics as it has a well- defined correspondence relationship with 0s and 1s in the Figure 2: Layers of abstractions of modern computer instructions. The transition is completed easily by the architecture assembler. correctly” [1]. And in the future part of this article, we’ll 2.1 Translating high-level languages discuss in a little more detail about some of its most In the rest of this section, we will introduce two distinct important components such as memory organization, translation techniques adopted by high-level languages, instruction set, addressing mode, privilege, priority so on and then discuss the technique C uses in more details. and so forth. Cause after all, it’s ISA that links the biggest 2.1.1 Interpretation and Compilation gap between a high-level programming language like C In Compilers, principles, techniques and tools, the study of and the fundamental movements of the electrons to support compilers is described as “full of beautiful examples where the whole computing system. In the rest of this article, complicated real-world problems are solved by abstracting we’ll start from the top of the layers above, beginning with the essence of the problem mathematically” [5]. How the the amazing translation process of the programming translation is done depends on the particular high-level language (both high-level and assembly language), language. Some languages like LISP, BASIC, Python and spanning the gap between software and hardware with the Java adopt a translation technique called interpretation by help of ISA, and observing how the commands from the the interpreter, while other languages such as C, C++, Rust software are executed successfully by the underneath and FORTRAN may use another technique called circuits. Although LC3 [2, 3] is quite different from most compilation via a compiler. implementations of computer architecture today, it will The interpreter is a visual machine that executes the help the beginners to understand the intricacies of program. It read a single line (or a section, command or computing systems in a more elegant manner. subroutine) of the high-level language program, and directly carry out the effects of the line on the underlying 2 Programming Language hardware repeatedly until the end of the program. In Programming Language Pragmatics, the programming Interpreted code is more portable across different language is described as “the art of telling another human computing systems, since it’s nothing more than the input being what one wants the computer to do” [4]. High-level data to the interpreter in different platforms. However, it programming language such as C is designed as machine- makes the program to execute for a much longer time with independent and human-friendly. The creation of high- the interpreter as an intermediary. The compiler, on the level languages makes the programmer no longer need to other hand, doesn’t execute the program itself. It analyzes write functionally-similar code for different machines the high-level language program as a whole and generates repeatedly and considerably alleviates the workload. But the corresponding assembly language or even machine accompanying the benefits, the biggest problem for the language based on the particular machine. The high-level machine is that the high-level programming language is so language program needs only to be compiled once and can ambiguous as it doesn’t define any kind of specific actions be executed many times afterwards, thus incredibly over specific memory spaces or registers, and the enhancing the program’s efficiency. These two different implementations of the same program for different translation techniques have both pros and cons based on

2 the specific application scenarios. As the C programming bookkeeping mechanism called symbol table is created. language was initially developed for the designing of Again, let’s take the C program at the beginning of this compilers and operating systems, the adoption of the article as an example. The symbol table of the program is compilation technique guarantees the effectiveness and shown in Table 1. The symbol table keeps the variable’s dependency of the product. identifier, type, location and scope. The memories 2.1.2 The Compilation Process of C allocated for the variables are arranged in the form of a The C compiler transforms the C source program into an stack, hence the location for each variable can be expressed output assembly language or machine code file named as an offset relative to a certain memory location. executable image. Figure 3 shows an illustration of the Table 1: Symbol table of the example C program overall compilation process of C. As we can see, the C Identifier Type Location Scope compiler has three interconnected components named as (as an preprocessor, compiler and linker respectively. offset) At the beginning of the C compilation process, the x int 0 main preprocessor scans the whole C source file, looking for and y int -1 main acting upon C preprocessor directives. Let’s take the C The linker takes over after the compiler has generated program at the beginning of this article as an example. The all the object modules. It’s linker’s job to link all the object preprocessor will scan the whole program, substituting modules to form an executable image of the program. preprocessor directive ZERO and NEGONE with 0 and -1, Finally, the whole compilation process is accomplished. and inserting the contents in stdio.h into the source file at Depending on different C compilers, the executable image the corresponding line. may be written in either assembly language or machine After that, the compiler will transform the code. If it’s the latter situation, the executable image can preprocessed program into object modules by two major be directly loaded into memory and executed by the phases called analysis and synthesis. The analysis parsed underlying hardware. Otherwise, it needs to be assembled the program, breaking it into its constituent parts, and first by a two-pass process. synthesis translates these parts, optimizing the code for 2.2 The Two-pass Assembly Process better performance at the same time. Each of these two Let’s take a look at an example RISC-V assembly language phases are typically divided into many subphases such as program to get a straightforward recognition [6]. As parsing, register allocation, instruction scheduling so on and so forth. When the compiler is working, an internal

Figure 3: Overall compilation process of the C programming language

3 After we get an executable image of the C program, we are ready to see how the actions of the computer are exactly directed. Instruction set is the core the ISA, regarded as the vocabulary of the computer’s language. In the rest of this section, we will introduce you the concept of the von Neumann Model, understanding how an instruction cycle is accomplished, discussing about the intricacies of the operate/data movement/control instructions, and briefly talking about the implementations of memory-mapped IO, interrupt, subroutines and the user/system mode. 3.1 The von Neumann Model The von Neumann model, proposed by John von Neumann in 1946, has become the foundation of most of the computing systems today. Figure 4 shows an overall block diagram of the von Neumann model. As we can see, the model consists of five parts: memory, a processing unit, input, output and a control unit. The control unit exits in all kinds of machines that can Figure 4: RISC-V assembly language program calculating be called a computer, or a universal Turing machine as the greatest common divisor of two positive integers another name. It can be abstracted as a finite state machine shown in Figure 4, an assembly language program mainly (FSM), keeping track of where we are inside the execution consists of /operands, labels, pseudo-ops (also of both the program and each instruction. The abbreviation known as assembler directives) and comment. (Notes: for PC an IR stands for , which stores the more information about RISC-V assembly language, address of the next instruction, and instruction register, please check 1 and 6.) The transformation from assembly which keeps the content of the current instruction, language to the machine code is accomplished by a two- respectively. The state of the computer transforms between pass process via the assembler. each other based on the corresponding parts in the current The first pass is to create the symbol table. Similar to instruction, directing the data path to take specific actions. the symbol table in the compilation process, the symbol The state machine of modern time computers is usually too table in assembly process is simply a correspondence of symbolic names (labels) with their specific memory addresses. In the second pass, the assembler goes through the program for a second time. The symbolic names in the control instructions such as euclid and finish in our example will be substituted with their specific memory addresses according to the symbol table gained earlier. After that, the assembly language instructions will be translated into 0s and 1s line by line and finally get the machine code executable image of the assembly process. As said in the last section, the executable image can be directly loaded into the memory and executed by the underlying hardware. Figure 4: Overall block diagram of the von Neumann 3 Instruction Set Architecture model

4 sophisticated, we just show a graph of part of the LC3 state machine to give you a rough recognition (shown in Figure 5). The central idea of the von Neumann model is that the program and data are both stored as sequences of bits in the computer’s memory, and the program is executed one instruction at a time under the direction of the control unit. Before we leave, let’s discuss a little bit about the system (kernel) and user mode. As shown in Figure 6, in modern computers, application programs are running on the operating system. The memory is usually separated into several parts, with certain parts only accessible by the Figure 6: An example of system call, showing the system software. When a programmer or the standard difference between the user mode and kernel mode library wants to execute a certain function provided by the 3.2 The Instruction Cycle operating system, it invokes a system call. Otherwise, the Instructions are executed under the direction of the control application programs will be denied to get access to the unit in a very systematic, step-by-step manner. The privileged memory space or the device register addresses sequence of the steps (or phases in computer science (which will be discussed further in memory-mapped IO). terminology) is called the instruction cycle. There’re How the operating system works and how to improve its mainly six phases in a complete instruction cycle (although performance are extremely important questions in many instructions only require part of these phases): fetch, computer science. For readers interested in more decode, evaluate address, fetch operands, execute and information, please check 7 in the reference. store result. In the FETCH phase, the computer obtains the next instruction by the address stored in PC, loading it into the IR, incrementing the PC simultaneously. In the DECODE phase, the computer examines the first several bits (called the opcode) of the instruction, figuring out what the underlying microarchitecture is requested. If the instruction requests a load or store action, the computer calculates the addresses of the corresponding operands based on the specific addressing mode of the instruction in the EVALUATE ADDRESS phase. Then, the computer accesses the memory, obtaining the source operands needed in the FETCH OPERAND phase. In EXECUTE phase, values in registers are generated if an operate instruction, load or store happens if a data movement instruction, PC redirected if a control instruction. And finally, the result is stored in STORE RESULT phase. Each of these phases in the instruction cycle may takes several clock cycles according the specific ISA. Factors like CPI (abbreviation for clock cycle per Figure 5: A state machine of LC3, state transformation is instruction), instruction count and clock rate are activated by the information in the instruction significant when evaluating a program’s performance.

5 RISC-V are shown in Figure 8. Operate instructions process data, performing either arithmetic or logic operations. The operands of this kind of instructions can only be found in two places: registers or in the instruction itself (immediate operand in compute science terminology). RISC-V supports may operate instructions such as ADD, ADDI, AND, ANDI, SLL, SRL etc., performing arithmetic, logic and shifts. Data movement instructions move information between general-purpose registers and either memory space or input/output devices (which can be also regarded Figure 7: Different component’s influence upon as some kind of special memory space). Specifically, the program’s performance data movement instructions load data from memory to the For hardware and software’s influence upon them, please registers, or store data from registers to the memory. The check Figure7. specific memory address is calculated from the address 3.3 The Instruction Set & Memory-mapped generation bits in the instruction. The calculation rule is IO and Interrupt determined by the addressing mode of the instruction, An instruction is defined by three parts: its opcode, data including PC-relative mode, indirect mode, base-offset type and addressing mode. Approximately, all the mode, immediate mode and so on [1, 2]. The creation of instructions can be divided in to three distinct categories: different addressing modes is to reach more memory operate instructions, data movement instructions and spaces, some of which may be relatively far from the PC, control instructions. Some of the main instructions of the as possible. Also, data movement instructions are the workload when performing input/output tasks. In most modern time computers, device registers are mapped to some particular addresses allocated for I/O device registers rather than normal memory spaces. The computer controls the data in the memory-mapped device registers by exactly the same data movement instructions to perform input/output tasks. This is usually done by two ways. One is called polling, the other is called interrupt. The difference between them is that the polling method requires the computer to check the device registers repeatedly when an I/O task is needed; and the interrupt method, on the contrary, the computer only stops to perform the I/O task when detecting a signal indicating an input or output is ready, and return to the interrupted task as if nothing has happened. Control instructions change the sequence of the executing instructions conditionally or unconditionally. It reaches this goal by change the content in PC in the EXECUTE phase of the instruction cycle. Otherwise, the computer will execute the instruction in the next address Figure 8: RISC-V reference card (main part), listing kinds since the PC is always incremented during the end of the of instructions and their assembly language expresses FETCH phase. The condition of the conditional control

6 instructions is checked via the condition code, which of this article, the computing systems are nothing more showing the results of the last instruction changing the than “hardware and software consisting of hierarchical value in the registers. layers using abstraction, with each lower layer hiding details from the level above”. It couldn’t feel more 4 Introduction to Microarchitecture and amazing to see how a high-level C program is compiled, Digital Logic Devices assembled and finally executed by the underlying At the end of this article, we’ll briefly introduce some hardware, instruction by instruction, clock cycle after concepts of the underlying hardware. Figure 9 shows the clock cycle. It’s like we’re the conductors of an microarchitecture of LC3. unprecedentedly sophisticated orchestra, creating splendid The microarchitecture of the computer is composed symphonies with simple waves of the baton in our hands. of combinational logic circuits and sequential logic No one can be indifferent towards this greatest artificiality circuits. The combinational logic circuits are responsible in the human history. for logic choices. Some basic components include encoder, decoder, mux, D-mux, full adder, so on and so forth. On Acknowledgements: This article can never exist without the other hand, the sequential logic circuit, the foundation Prof. Hong An and Prof. Junxia Zhang’s great efforts in the of storage structures and finite state machine, are affected Introduction to computing systems(H) and Analog and by both the combination of the current inputs and the result Digital Circuits Course. from the past ones. Some basic components include latches and flip-flops. In fact, all these digital logic devices are all systematical combinations of MOSFETs, the abbreviation for Metal-oxide-semiconductor field-effect transistors. It’s the open and close of those magical transistors that creating our magnificent world of 1s and 0s.

5 Conclusion Let’s remind the words of David Patterson at the beginning

Reference [1] David A. Patterson, John L. Hennessy. Computer Organization and Design, RISC-V Edition. [2] Yale N. Patt, Sanjay J. Patel. Introduction to Computing Systems, 2nd Edition. [3] LC3 Simulator. http://wchargin.github.io/lc3web/ [4] Michael L. Scott, Morgan Kauffmann. Programming Language Pragmatics. [5] Alfred V. Aho. Compilers, Principles, Techniques and Tools [6] The RISC-V Instruction Set Manual Figure 9: The data path of LC3, including components for [7] Abraham Silberschatz, Peter Bear Galvin, Greg interrupt control Gagne. Operating System Concepts.

7