The Computer Organization and Design Underneath the Execution of C Programming Language Mingkai Li1 1University of Science and Technology of China

Total Page:16

File Type:pdf, Size:1020Kb

The Computer Organization and Design Underneath the Execution of C Programming Language Mingkai Li1 1University of Science and Technology of China Top-down Perspective: The Computer Organization and Design Underneath the Execution of C Programming Language Mingkai Li1 1University of Science and Technology of China developed initially for the designing of compilers and Abstract operating systems, thus allowing the C programmer to In Yale Patt’s book Introduction to computing systems, manipulate data items at a relatively very low level. To from bits and gates to C and beyond, the intricacies of the better elaborate the computer’s actions when executing the magnificent computing world reveal themselves as a huge, high-level programming language, we create an example systematically interconnected collection of some very C program. Although simple, the program shows some of simple parts. Although implementations of many modern the most important features of the language, allowing us to architectures vary greatly to gain shorter response time or discuss further about the implementation method of things greater throughput (bandwidth as sometimes called), the like preprocessing, linking, subroutine, control instruction, underneath computer organization and design is no more data movement instruction, memory-mapped IO, so on and than hardware and software consisting of hierarchical so forth. Thus, help the readers quickly grasp a rough layers using abstraction, with each lower layer hiding recognition about all those lower layers of abstractions details from the level above. The C programming language behind the high-level programming language through this provides a machine-independent interface with the article. underlying ISA and hardware, tremendously enhancing the At the very beginning, we need to establish an program’s expressiveness and readability. Different from overview about the hierarchy or the layers of abstractions the bottom-up approach adopted in Yale Patt’s book, we about the whole computing system. As shown in Figure 2, take a top-down perspective to uncover the details the instruction set architecture (or ISA as abbreviation) underneath the execution of C programming language plays a vital role as the communication between the step-by-step. To better elaborate the execution of the hardware and the low-level system software. Computer program on a specific implementation and avoid Origination and Design defines the ISA as “anything the unnecessary complexities of the modern-time architectures, programmer need to know to make a binary machine work we choose an education-oriented implementation called Little Computer 3 (or LC3 for short) introduced by University of Texas, Austin. In this article, we’ll briefly elaborate some important ideas and protocols in the domain from interpreter/compiler to the very fundamental digital logic devices. We’ll observe the process as the program written in C programming language being translated, assembled and finally being executed by the computer instruction by instruction, clock cycle by clock cycle over the data path. And see how the combinations of the simplest CMOS circuits have shaped today’s fast- changing Information Technology industry. 1 Introduction The C programming language was developed in 1972 by Figure 1: An example program calculating the absolute Dennis Ritchie at Bell Laboratory. The language was value written in C 1 machines may be distinctly different. To execute such a machine-independent programming language, it needs to be translated into a specific machine-dependent assembly language with the help of the low-level system software such as compiler or interpreter. Compared with high-level programming languages, assembly language is relatively much more machine-friendly. It is nothing more than some kind of more human-readable mnemonics as it has a well- defined correspondence relationship with 0s and 1s in the Figure 2: Layers of abstractions of modern computer instructions. The transition is completed easily by the architecture assembler. correctly” [1]. And in the future part of this article, we’ll 2.1 Translating high-level languages discuss in a little more detail about some of its most In the rest of this section, we will introduce two distinct important components such as memory organization, translation techniques adopted by high-level languages, instruction set, addressing mode, privilege, priority so on and then discuss the technique C uses in more details. and so forth. Cause after all, it’s ISA that links the biggest 2.1.1 Interpretation and Compilation gap between a high-level programming language like C In Compilers, principles, techniques and tools, the study of and the fundamental movements of the electrons to support compilers is described as “full of beautiful examples where the whole computing system. In the rest of this article, complicated real-world problems are solved by abstracting we’ll start from the top of the layers above, beginning with the essence of the problem mathematically” [5]. How the the amazing translation process of the programming translation is done depends on the particular high-level language (both high-level and assembly language), language. Some languages like LISP, BASIC, Python and spanning the gap between software and hardware with the Java adopt a translation technique called interpretation by help of ISA, and observing how the commands from the the interpreter, while other languages such as C, C++, Rust software are executed successfully by the underneath and FORTRAN may use another technique called circuits. Although LC3 [2, 3] is quite different from most compilation via a compiler. implementations of computer architecture today, it will The interpreter is a visual machine that executes the help the beginners to understand the intricacies of program. It read a single line (or a section, command or computing systems in a more elegant manner. subroutine) of the high-level language program, and directly carry out the effects of the line on the underlying 2 Programming Language hardware repeatedly until the end of the program. In Programming Language Pragmatics, the programming Interpreted code is more portable across different language is described as “the art of telling another human computing systems, since it’s nothing more than the input being what one wants the computer to do” [4]. High-level data to the interpreter in different platforms. However, it programming language such as C is designed as machine- makes the program to execute for a much longer time with independent and human-friendly. The creation of high- the interpreter as an intermediary. The compiler, on the level languages makes the programmer no longer need to other hand, doesn’t execute the program itself. It analyzes write functionally-similar code for different machines the high-level language program as a whole and generates repeatedly and considerably alleviates the workload. But the corresponding assembly language or even machine accompanying the benefits, the biggest problem for the language based on the particular machine. The high-level machine is that the high-level programming language is so language program needs only to be compiled once and can ambiguous as it doesn’t define any kind of specific actions be executed many times afterwards, thus incredibly over specific memory spaces or registers, and the enhancing the program’s efficiency. These two different implementations of the same program for different translation techniques have both pros and cons based on 2 the specific application scenarios. As the C programming bookkeeping mechanism called symbol table is created. language was initially developed for the designing of Again, let’s take the C program at the beginning of this compilers and operating systems, the adoption of the article as an example. The symbol table of the program is compilation technique guarantees the effectiveness and shown in Table 1. The symbol table keeps the variable’s dependency of the product. identifier, type, location and scope. The memories 2.1.2 The Compilation Process of C allocated for the variables are arranged in the form of a The C compiler transforms the C source program into an stack, hence the location for each variable can be expressed output assembly language or machine code file named as an offset relative to a certain memory location. executable image. Figure 3 shows an illustration of the Table 1: Symbol table of the example C program overall compilation process of C. As we can see, the C Identifier Type Location Scope compiler has three interconnected components named as (as an preprocessor, compiler and linker respectively. offset) At the beginning of the C compilation process, the x int 0 main preprocessor scans the whole C source file, looking for and y int -1 main acting upon C preprocessor directives. Let’s take the C The linker takes over after the compiler has generated program at the beginning of this article as an example. The all the object modules. It’s linker’s job to link all the object preprocessor will scan the whole program, substituting modules to form an executable image of the program. preprocessor directive ZERO and NEGONE with 0 and -1, Finally, the whole compilation process is accomplished. and inserting the contents in stdio.h into the source file at Depending on different C compilers, the executable image the corresponding line. may be written in either assembly language or machine After that, the compiler will transform the code. If it’s the latter situation, the executable image can preprocessed program into object modules by two major be directly loaded into memory and executed by the phases called analysis and synthesis. The analysis parsed underlying hardware. Otherwise, it needs to be assembled the program, breaking it into its constituent parts, and first by a two-pass process. synthesis translates these parts, optimizing the code for 2.2 The Two-pass Assembly Process better performance at the same time. Each of these two Let’s take a look at an example RISC-V assembly language phases are typically divided into many subphases such as program to get a straightforward recognition [6].
Recommended publications
  • Second-Generation Stack Computer Architecture
    Second-Generation Stack Computer Architecture Charles Eric LaForest A thesis presented to the Independent Studies Program of the University of Waterloo in fulfilment of the thesis requirements for the degree Bachelor of Independent Studies (BIS) Independent Studies University of Waterloo Canada April 2007 ii Declaration I hereby declare that I am the sole author of this research paper. I authorize the University of Waterloo to lend this thesis to other institutions or individuals for the purpose of scholarly research. Signature: I further authorize the University of Waterloo to reproduce this research paper by photocopy- ing or other means, in total or in part, at the request of other institutions or individuals for the purpose of scholarly research. Signature: The work in this research paper is based on research carried out in the Independent Studies Program at the University of Waterloo, Canada. No part of this thesis has been submitted else- where for any other degree or qualification and is all my own work unless referenced to the contrary in the text. Copyright c 2007 by Charles Eric LaForest. The copyright of this thesis rests with the author. Quotations and information derived from it must be acknowledged. iii Second-Generation Stack Computer Architecture Charles Eric LaForest Submitted for the degree of Bachelor of Independent Studies April 2007 Abstract It is commonly held in current computer architecture literature that stack-based computers were entirely superseded by the combination of pipelined, integrated microprocessors and improved compilers. While correct, the literature omits a second, new generation of stack computers that emerged at the same time.
    [Show full text]
  • Second-Generation Stack Computer Architecture
    Second-Generation Stack Computer Architecture Charles Eric LaForest A thesis presented to the Independent Studies Program of the University of Waterloo in fulfilment of the thesis requirements for the degree Bachelor of Independent Studies (BIS) Independent Studies University of Waterloo Canada April 2007 ii Declaration I hereby declare that I am the sole author of this research paper. I authorize the University of Waterloo to lend this thesis to other institutions or individuals for the purpose of scholarly research. Signature: [:.,,,u..c ;(~ I further authorize the University of Waterloo to reproduce this research paper by photocopy­ ing or other means, in total or in part, at the request of other institutions or individuals for the pUipose of scholarly research. SignatUI·e: 6 ~ >{!. The work in this research paper is based on research carried out in the Independent Studies Program at the University of Waterloo, Canada. No part of this thesis has been submitted else­ where for any other degree or qualification and is all my own work unless referenced to the contrary in the text. Copyright© 2007 by Charles Eric La Forest. The copyright of this thesis rests with the author. Quotations and infonnation derived from it must be acknowledged. Ill Second-Generation Stack Computer Architecture Charles Eric LaForest Submitted for the degree of Bachelor of Independent Studies April 2007 Abstract It is commonly held in current computer architecture literature that stack-based computers were entirely superseded by the combination of pipelined, integrated microprocessors and improved compilers. While correct, the literature omits a second, new generation of stack computers that emerged at the same time.
    [Show full text]
  • LC-3B Simulator
    CENG3420 Lab 2-1: LC-3b Simulator Bei Yu Department of Computer Science and Engineering The Chinese University of Hong Kong [email protected] Spring 2018 1 / 29 Overview LC-3b Basis LC-3b Assembly Examples LC-3b Simulator Task 2 / 29 Overview LC-3b Basis LC-3b Assembly Examples LC-3b Simulator Task 3 / 29 Assembler & Simulator I Assembly language – symbolic (MIPS, LC-3b, ...) I Machine language – binary I Assembler is a program that I turns symbols into machine instructions. I EX: lc3b_asm, SPIM, ... I Simulator is a program that I mimics the behavior of a processor I usually in high-level language I EX: lc3b_sim, SPIM, ... 3 / 29 LC-3b I LC-3b: Little Computer 3, b version. I Relatively simple instruction set I Most used in teaching for CS & CE I Developed by Yale Patt@UT & Sanjay J. Patel@UIUC 4 / 29 LC-3 Architecture I RISC – only 15 instructions I 16-bit data and address I 8 general-purpose registers (GPR) Plus 4 special-purpose registers: I Program Counter (PC) I Instruction Register (IR) I Condition Code Register (CC) I Process Status Register (PSR) 5 / 29 Memory 2k × m array of stored bits: Address I unique (k-bit) identifier of location I LC-3: k = 16 Contents I m-bit value stored in location I LC-3: m = 16 Basic Operations: I READ (Load): value in a memory location ! the Processor I WRITE (Store): value in the Processor ! a memory location 6 / 29 Interface to Memory How does the processing unit get data to/from memory? I MAR: Memory Address Register I MDR: Memory Data Register To LOAD from a location (A): 1.
    [Show full text]
  • Computer Architecture (TT 2011) the MIPS/DLX/RISC Architecture
    Computer Architecture (TT 2011) The MIPS/DLX/RISC Architecture Daniel Kroening Oxford University, Computer Science Department Version 1.0, 2011 Outline ISAs Overview MIPS/DLX Instruction Formats D. Kroening: Computer Architecture (TT 2011) 2 Roadmap for today I We will discuss Instruction Set Architectures (ISAs) I These summarise the behavior of a CPU from the point of view of the programmer I An ISA describes “what the CPU does” I Ideally as little as possible about “how the CPU does it” D. Kroening: Computer Architecture (TT 2011) 3 I We will study two ISAs: 1. RISC: specifically the DLX (academic variant of the MIPS R3000) 2. CISC: specifically the Y86 (academic variant of Intel’s x86) I One of the goals of this course is to understand the difference D. Kroening: Computer Architecture (TT 2011) 4 History MIPS/DLX I 1981: John L. Hennessy, Stanford University I 1984: MIPS Computer Systems I 1985: R2000 released I 1988: R3000 released (used e.g., by SGI) I 1991: R4000 released (64 bits) I Now primarily licensed as IP, built by numerous vendors, with focus on low-end embedded systems D. Kroening: Computer Architecture (TT 2011) 5 Overview MIPS/DLX memory memory I/O module module (USB, ...) MIPS Processor data PC R0 R1 address R2 R3 control R4 ... I Programs and data are held in the same memory I I/O is also done via “memory” (memory-mapped I/O) D. Kroening: Computer Architecture (TT 2011) 6 Visible Registers I RAM, organised in 32-bit words I Registers I R0 to R31 I R0 is a special case: value is hardwired to 0 I Usual MIPS notation: $0 .
    [Show full text]