The Computer Organization and Design Underneath the Execution of C Programming Language Mingkai Li1 1University of Science and Technology of China

Top-down Perspective: The Computer Organization and Design Underneath the Execution of C Programming Language Mingkai Li1 1University of Science and Technology of China developed initially for the designing of compilers and Abstract operating systems, thus allowing the C programmer to In Yale Patt’s book Introduction to computing systems, manipulate data items at a relatively very low level. To from bits and gates to C and beyond, the intricacies of the better elaborate the computer’s actions when executing the magnificent computing world reveal themselves as a huge, high-level programming language, we create an example systematically interconnected collection of some very C program. Although simple, the program shows some of simple parts. Although implementations of many modern the most important features of the language, allowing us to architectures vary greatly to gain shorter response time or discuss further about the implementation method of things greater throughput (bandwidth as sometimes called), the like preprocessing, linking, subroutine, control instruction, underneath computer organization and design is no more data movement instruction, memory-mapped IO, so on and than hardware and software consisting of hierarchical so forth. Thus, help the readers quickly grasp a rough layers using abstraction, with each lower layer hiding recognition about all those lower layers of abstractions details from the level above. The C programming language behind the high-level programming language through this provides a machine-independent interface with the article. underlying ISA and hardware, tremendously enhancing the At the very beginning, we need to establish an program’s expressiveness and readability. Different from overview about the hierarchy or the layers of abstractions the bottom-up approach adopted in Yale Patt’s book, we about the whole computing system. As shown in Figure 2, take a top-down perspective to uncover the details the instruction set architecture (or ISA as abbreviation) underneath the execution of C programming language plays a vital role as the communication between the step-by-step. To better elaborate the execution of the hardware and the low-level system software. Computer program on a specific implementation and avoid Origination and Design defines the ISA as “anything the unnecessary complexities of the modern-time architectures, programmer need to know to make a binary machine work we choose an education-oriented implementation called Little Computer 3 (or LC3 for short) introduced by University of Texas, Austin. In this article, we’ll briefly elaborate some important ideas and protocols in the domain from interpreter/compiler to the very fundamental digital logic devices. We’ll observe the process as the program written in C programming language being translated, assembled and finally being executed by the computer instruction by instruction, clock cycle by clock cycle over the data path. And see how the combinations of the simplest CMOS circuits have shaped today’s fast- changing Information Technology industry. 1 Introduction The C programming language was developed in 1972 by Figure 1: An example program calculating the absolute Dennis Ritchie at Bell Laboratory. The language was value written in C 1 machines may be distinctly different. To execute such a machine-independent programming language, it needs to be translated into a specific machine-dependent assembly language with the help of the low-level system software such as compiler or interpreter. Compared with high-level programming languages, assembly language is relatively much more machine-friendly. It is nothing more than some kind of more human-readable mnemonics as it has a well- defined correspondence relationship with 0s and 1s in the Figure 2: Layers of abstractions of modern computer instructions. The transition is completed easily by the architecture assembler. correctly” [1]. And in the future part of this article, we’ll 2.1 Translating high-level languages discuss in a little more detail about some of its most In the rest of this section, we will introduce two distinct important components such as memory organization, translation techniques adopted by high-level languages, instruction set, addressing mode, privilege, priority so on and then discuss the technique C uses in more details. and so forth. Cause after all, it’s ISA that links the biggest 2.1.1 Interpretation and Compilation gap between a high-level programming language like C In Compilers, principles, techniques and tools, the study of and the fundamental movements of the electrons to support compilers is described as “full of beautiful examples where the whole computing system. In the rest of this article, complicated real-world problems are solved by abstracting we’ll start from the top of the layers above, beginning with the essence of the problem mathematically” [5]. How the the amazing translation process of the programming translation is done depends on the particular high-level language (both high-level and assembly language), language. Some languages like LISP, BASIC, Python and spanning the gap between software and hardware with the Java adopt a translation technique called interpretation by help of ISA, and observing how the commands from the the interpreter, while other languages such as C, C++, Rust software are executed successfully by the underneath and FORTRAN may use another technique called circuits. Although LC3 [2, 3] is quite different from most compilation via a compiler. implementations of computer architecture today, it will The interpreter is a visual machine that executes the help the beginners to understand the intricacies of program. It read a single line (or a section, command or computing systems in a more elegant manner. subroutine) of the high-level language program, and directly carry out the effects of the line on the underlying 2 Programming Language hardware repeatedly until the end of the program. In Programming Language Pragmatics, the programming Interpreted code is more portable across different language is described as “the art of telling another human computing systems, since it’s nothing more than the input being what one wants the computer to do” [4]. High-level data to the interpreter in different platforms. However, it programming language such as C is designed as machine- makes the program to execute for a much longer time with independent and human-friendly. The creation of high- the interpreter as an intermediary. The compiler, on the level languages makes the programmer no longer need to other hand, doesn’t execute the program itself. It analyzes write functionally-similar code for different machines the high-level language program as a whole and generates repeatedly and considerably alleviates the workload. But the corresponding assembly language or even machine accompanying the benefits, the biggest problem for the language based on the particular machine. The high-level machine is that the high-level programming language is so language program needs only to be compiled once and can ambiguous as it doesn’t define any kind of specific actions be executed many times afterwards, thus incredibly over specific memory spaces or registers, and the enhancing the program’s efficiency. These two different implementations of the same program for different translation techniques have both pros and cons based on 2 the specific application scenarios. As the C programming bookkeeping mechanism called symbol table is created. language was initially developed for the designing of Again, let’s take the C program at the beginning of this compilers and operating systems, the adoption of the article as an example. The symbol table of the program is compilation technique guarantees the effectiveness and shown in Table 1. The symbol table keeps the variable’s dependency of the product. identifier, type, location and scope. The memories 2.1.2 The Compilation Process of C allocated for the variables are arranged in the form of a The C compiler transforms the C source program into an stack, hence the location for each variable can be expressed output assembly language or machine code file named as an offset relative to a certain memory location. executable image. Figure 3 shows an illustration of the Table 1: Symbol table of the example C program overall compilation process of C. As we can see, the C Identifier Type Location Scope compiler has three interconnected components named as (as an preprocessor, compiler and linker respectively. offset) At the beginning of the C compilation process, the x int 0 main preprocessor scans the whole C source file, looking for and y int -1 main acting upon C preprocessor directives. Let’s take the C The linker takes over after the compiler has generated program at the beginning of this article as an example. The all the object modules. It’s linker’s job to link all the object preprocessor will scan the whole program, substituting modules to form an executable image of the program. preprocessor directive ZERO and NEGONE with 0 and -1, Finally, the whole compilation process is accomplished. and inserting the contents in stdio.h into the source file at Depending on different C compilers, the executable image the corresponding line. may be written in either assembly language or machine After that, the compiler will transform the code. If it’s the latter situation, the executable image can preprocessed program into object modules by two major be directly loaded into memory and executed by the phases called analysis and synthesis. The analysis parsed underlying hardware. Otherwise, it needs to be assembled the program, breaking it into its constituent parts, and first by a two-pass process. synthesis translates these parts, optimizing the code for 2.2 The Two-pass Assembly Process better performance at the same time. Each of these two Let’s take a look at an example RISC-V assembly language phases are typically divided into many subphases such as program to get a straightforward recognition [6].

Load more