<<

History Von Neumann Model Fetch-Decode-Execute Cycle Speeding Things Up Conclusion

Computer Architecture Overview ICS332 — Operating Systems

Henri Casanova ([email protected])

Spring 2018

Henri Casanova ([email protected]) Architecture Overview Main sponsor: University of / Ballistic Research Laboratory ($487k eq. 2016 $7M) Designers: Mauchly and Eckert First operators (i.e., ): The 6 “ENIAC Girls” (McNulty, Jennings, Snyder, Wescoff, Bilas, and Lichterman)

History Von Neumann Model ENIAC Fetch-Decode-Execute Cycle Von Neumann Model Speeding Things Up Conclusion 1946 — ENIAC

Electronic Numerical Integrator And Computer aka “Giant Brain” First electronic general-purpose computer Before that, “were humans, who could use non-programmable mechanical and later electrical computation tools Could be reprogrammed (Stored-Program Computer instead of Fixed-Program Computer)

Henri Casanova ([email protected]) Overview History Von Neumann Model ENIAC Fetch-Decode-Execute Cycle Von Neumann Model Speeding Things Up Conclusion 1946 — ENIAC

Electronic Numerical Integrator And Computer aka “Giant Brain” First electronic general-purpose computer Before that, “were humans, who could use non-programmable mechanical and later electrical computation tools Could be reprogrammed (Stored-Program Computer instead of Fixed-Program Computer) Main sponsor: University of Pennsylvania / Ballistic Research Laboratory ($487k eq. 2016 $7M) Designers: Mauchly and Eckert First operators (i.e., programmers): The 6 “ENIAC Girls” (McNulty, Jennings, Snyder, Wescoff, Bilas, and Lichterman)

Henri Casanova ([email protected]) Computer Architecture Overview History Von Neumann Model ENIAC Fetch-Decode-Execute Cycle Von Neumann Model Speeding Things Up Conclusion 1946 — ENIAC (Features)

1000x faster than (specialized) electro-mechanical equivalent 2400x times faster than (specialized) human being (30 seconds instead of 20 hours) 100 kHz / 5 kIPS (now: 4GHz / 5,000 MIPS) 1,000 bits of RAM (i.e., 0.12 KiB) 150 kW (now: 200W) 17,468 vacuum tubes (failure prone, power hungry) 8 × 3 × 100 ft; 27 metric tons (60,000 pounds)

Henri Casanova ([email protected]) Computer Architecture Overview History Von Neumann Model ENIAC Fetch-Decode-Execute Cycle Von Neumann Model Speeding Things Up Conclusion 1946 — ENIAC (Pictures)

Henri Casanova ([email protected]) Computer Architecture Overview History Von Neumann Model ENIAC Fetch-Decode-Execute Cycle Von Neumann Model Speeding Things Up Conclusion 1946 — ENIAC (Pictures)

Henri Casanova ([email protected]) Computer Architecture Overview History Von Neumann Model ENIAC Fetch-Decode-Execute Cycle Von Neumann Model Speeding Things Up Conclusion 1946 — ENIAC (Pictures)

Henri Casanova ([email protected]) Computer Architecture Overview This became the Model A performs operations and controls the sequence of operations A Memory Unit contains code and data Some kind of Input and Output mechanisms (I/O)

History Von Neumann Model ENIAC Fetch-Decode-Execute Cycle Von Neumann Model Speeding Things Up Conclusion Von Neumann

ENIAC design frozen in 1943; Eckert and Mauchly work on a new design: the EDVAC 1944: Von Neumann (1903-1957) joins Eckert and Mauchly, writes a memo formalizing their ideas

Henri Casanova ([email protected]) Computer Architecture Overview History Von Neumann Model ENIAC Fetch-Decode-Execute Cycle Von Neumann Model Speeding Things Up Conclusion Von Neumann

ENIAC design frozen in 1943; Eckert and Mauchly work on a new design: the EDVAC 1944: Von Neumann (1903-1957) joins Eckert and Mauchly, writes a memo formalizing their ideas This became the Von Neumann Architecture Model A Central Processing Unit performs operations and controls the sequence of operations A Memory Unit contains code and data Some kind of Input and Output mechanisms (I/O)

Henri Casanova ([email protected]) Computer Architecture Overview Today a computer looks more CPU Disk Controller USB Controller Graphics Adapter like:

Memory

History Von Neumann Model Von Neumann Model Fetch-Decode-Execute Cycle Memory Unit Speeding Things Up Central Processing Unit Conclusion Von Neumann Model

Amazingly it is still possible to CPU ⇐⇒ Memory think of the computer this way at a m conceptual level (model from ∼70 years ago!) I/O

Henri Casanova ([email protected]) Computer Architecture Overview History Von Neumann Model Von Neumann Model Fetch-Decode-Execute Cycle Memory Unit Speeding Things Up Central Processing Unit Conclusion Von Neumann Model

Amazingly it is still possible to CPU ⇐⇒ Memory think of the computer this way at a m conceptual level (model from ∼70 years ago!) I/O

Today a computer looks more CPU Disk Controller USB Controller Graphics Adapter like:

Memory

Henri Casanova ([email protected]) Computer Architecture Overview History Von Neumann Model Von Neumann Model Fetch-Decode-Execute Cycle Memory Unit Speeding Things Up Central Processing Unit Conclusion Von Neumann Model

Amazingly it is still possible to CPU ⇐⇒ Memory think of the computer this way at a m conceptual level (model from ∼70 years ago!) I/O

Today a computer looks more CPU Disk Controller USB Controller Graphics Adapter like: Memory

Memory

Henri Casanova ([email protected]) Computer Architecture Overview History Von Neumann Model Von Neumann Model Fetch-Decode-Execute Cycle Memory Unit Speeding Things Up Central Processing Unit Conclusion Von Neumann Model: Origins

1847: Boolean algebra – Truth value (true / false), Boolean logic, Bit (binary digit) 1937: Shannon’s MS Thesis – Any logical, numerical relationship can be built using Boolean algebra Therefore, any “information” can be represented in binary form, and therefore we can build that only understand binary Building computers this way is technologically convenient: 0 Volt: False (0) ∼5 Volt: True (1)

Henri Casanova ([email protected]) Computer Architecture Overview History Von Neumann Model Von Neumann Model Fetch-Decode-Execute Cycle Memory Unit Speeding Things Up Central Processing Unit Conclusion The Von Neumann Architecture

CPU ⇐⇒ Memory m I/O

Henri Casanova ([email protected]) Computer Architecture Overview History Von Neumann Model Von Neumann Model Fetch-Decode-Execute Cycle Memory Unit Speeding Things Up Central Processing Unit Conclusion Memory Unit

Called Memory or RAM (Random Access Memory) for short I will say “memory” or “RAM” interchangeably The basic unit of memory is the byte (or octet, or octad, or octade) 1 Byte = 8 bits, e.g., “0110 1011”

Henri Casanova ([email protected]) Computer Architecture Overview History Von Neumann Model Von Neumann Model Fetch-Decode-Execute Cycle Memory Unit Speeding Things Up Central Processing Unit Conclusion Memory Unit

The memory contains numerical “information” / “data” / “content”

Content 3 1 4 1 25 9 2 167 -5 ...

Henri Casanova ([email protected]) Computer Architecture Overview History Von Neumann Model Von Neumann Model Fetch-Decode-Execute Cycle Memory Unit Speeding Things Up Central Processing Unit Conclusion Memory Unit

The “data” are represented in memory in binary as bytes

Content (Human) 0000 0011 3 0000 0001 1 0000 0100 4 0000 0001 1 0001 1001 25 0000 1001 9 0000 0010 2 1010 0111 167 1111 1011 -5 ......

Henri Casanova ([email protected]) Computer Architecture Overview History Von Neumann Model Von Neumann Model Fetch-Decode-Execute Cycle Memory Unit Speeding Things Up Central Processing Unit Conclusion Memory Unit

To be used, the data need to be located precisely in memory: addresses

Address Content (Human) 0 0000 0011 3 1 0000 0001 1 2 0000 0100 4 3 0000 0001 1 4 0001 1001 25 5 0000 1001 9 6 0000 0010 2 7 1010 0111 167 8 1111 1011 -5 ......

Henri Casanova ([email protected]) Computer Architecture Overview History Von Neumann Model Von Neumann Model Fetch-Decode-Execute Cycle Memory Unit Speeding Things Up Central Processing Unit Conclusion Memory Unit

... but because computers only understand binary, the addresses are binary too: Address Content (Human) 0000 0000 0000 0011 3 0000 0001 0000 0001 1 0000 0010 0000 0100 4 0000 0011 0000 0001 1 0000 0100 0001 1001 25 0000 0101 0000 1001 9 0000 0110 0000 0010 2 0000 0111 1010 0111 167 0000 1000 1111 1011 -5 ......

Henri Casanova ([email protected]) Computer Architecture Overview The CPU has instructions like “Read the byte at address X and give me its value” and “Write this value into the byte at address Y” The Memory Unit (Bus + RAM) has the hardware to make these instructions happen

History Von Neumann Model Von Neumann Model Fetch-Decode-Execute Cycle Memory Unit Speeding Things Up Central Processing Unit Conclusion Memory Unit

Each byte in memory is labeled by a unique address We talk of a byte-addressable memory All addresses on a computer have the same number of bits (e.g., 16-bit addresses)

Henri Casanova ([email protected]) Computer Architecture Overview History Von Neumann Model Von Neumann Model Fetch-Decode-Execute Cycle Memory Unit Speeding Things Up Central Processing Unit Conclusion Memory Unit

Each byte in memory is labeled by a unique address We talk of a byte-addressable memory All addresses on a computer have the same number of bits (e.g., 16-bit addresses)

The CPU has instructions like “Read the byte at address X and give me its value” and “Write this value into the byte at address Y” The Memory Unit (Bus + RAM) has the hardware to make these instructions happen

Henri Casanova ([email protected]) Computer Architecture Overview At address 0000 0000 0000 0011 the content is 0000 0001

(The contents of uninitial- ized memory are random)

History Von Neumann Model Von Neumann Model Fetch-Decode-Execute Cycle Memory Unit Speeding Things Up Central Processing Unit Conclusion Conceptual View of Memory (16-bit addresses example)

Address Content 0000 0000 0000 0000 0000 0011 0000 0000 0000 0001 0000 0001 0000 0000 0000 0010 0000 0100 0000 0000 0000 0011 0000 0001 0000 0000 0000 0100 0000 0101 0000 0000 0000 0101 0000 1001 0000 0000 0000 0110 0000 0010 0000 0000 0000 0111 0000 0110 0000 0000 0000 1000 0000 0101 ...... 1111 1111 1111 1111 0010 0101

Henri Casanova ([email protected]) Computer Architecture Overview History Von Neumann Model Von Neumann Model Fetch-Decode-Execute Cycle Memory Unit Speeding Things Up Central Processing Unit Conclusion Conceptual View of Memory (16-bit addresses example)

Address Content 0000 0000 0000 0000 0000 0011 0000 0000 0000 0001 0000 0001 0000 0000 0000 0010 0000 0100 At address 0000 0000 0000 0011 0000 0000 0000 0011 0000 0001 the content is 0000 0000 0000 0100 0000 0101 0000 0001 0000 0000 0000 0101 0000 1001 0000 0000 0000 0110 0000 0010 (The contents of uninitial- 0000 0000 0000 0111 0000 0110 ized memory are random) 0000 0000 0000 1000 0000 0101 ...... 1111 1111 1111 1111 0010 0101

Henri Casanova ([email protected]) Computer Architecture Overview We can write a program that does “At address 1000 0000, store the address of the first ’9’ (0000 1001) in memory”

Address Content 0000 0000 0000 0011 0000 0001 0000 0001 0000 0010 0000 0100 0000 0011 0000 0001 0000 0100 0000 0101 =⇒ 0000 0101 0000 1001 0000 0110 0000 0010 0000 0111 0000 0110 0000 1000 0000 0101 ...... 1000 0000 0000 0101 1000 0001 1001 0111

History Von Neumann Model Von Neumann Model Fetch-Decode-Execute Cycle Memory Unit Speeding Things Up Central Processing Unit Conclusion Conceptual View of Memory (8-bit addresses example)

Let’s consider a memory 8-bit addresses with this initial state.

Address Content 0000 0000 0000 0011 0000 0001 0000 0001 0000 0010 0000 0100 0000 0011 0000 0001 0000 0100 0000 0101 0000 0101 0000 1001 0000 0110 0000 0010 0000 0111 0000 0110 0000 1000 0000 0101 ...... 1000 0000 0110 0101 1000 0001 1001 0111

Henri Casanova ([email protected]) Computer Architecture Overview Address Content 0000 0000 0000 0011 0000 0001 0000 0001 0000 0010 0000 0100 0000 0011 0000 0001 0000 0100 0000 0101 =⇒ 0000 0101 0000 1001 0000 0110 0000 0010 0000 0111 0000 0110 0000 1000 0000 0101 ...... 1000 0000 0000 0101 1000 0001 1001 0111

History Von Neumann Model Von Neumann Model Fetch-Decode-Execute Cycle Memory Unit Speeding Things Up Central Processing Unit Conclusion Conceptual View of Memory (8-bit addresses example)

Let’s consider a memory 8-bit addresses with this initial state. We can write a program that does “At address 1000 0000, store the address of the first ’9’ (0000 1001) in memory”

Address Content 0000 0000 0000 0011 0000 0001 0000 0001 0000 0010 0000 0100 0000 0011 0000 0001 0000 0100 0000 0101 0000 0101 0000 1001 0000 0110 0000 0010 0000 0111 0000 0110 0000 1000 0000 0101 ...... 1000 0000 0110 0101 1000 0001 1001 0111

Henri Casanova ([email protected]) Computer Architecture Overview History Von Neumann Model Von Neumann Model Fetch-Decode-Execute Cycle Memory Unit Speeding Things Up Central Processing Unit Conclusion Conceptual View of Memory (8-bit addresses example)

Let’s consider a memory 8-bit addresses with this initial state. We can write a program that does “At address 1000 0000, store the address of the first ’9’ (0000 1001) in memory”

Address Content Address Content 0000 0000 0000 0011 0000 0000 0000 0011 0000 0001 0000 0001 0000 0001 0000 0001 0000 0010 0000 0100 0000 0010 0000 0100 0000 0011 0000 0001 0000 0011 0000 0001 0000 0100 0000 0101 0000 0100 0000 0101 0000 0101 0000 1001 =⇒ 0000 0101 0000 1001 0000 0110 0000 0010 0000 0110 0000 0010 0000 0111 0000 0110 0000 0111 0000 0110 0000 1000 0000 0101 0000 1000 0000 0101 ...... 1000 0000 0110 0101 1000 0000 0000 0101 1000 0001 1001 0111 1000 0001 1001 0111

Henri Casanova ([email protected]) Computer Architecture Overview It’s the job of the to know what memory content means (the CPU has no idea), which is a source of bugs Very well-known difficulty when writing assembly (ICS312/ICS331) High-level programming languages help, but in C you can do whatever: e.g., on a 64-bit architecture a C pointer is simply an unsigned long unsigned long x = 42; int *ptr = (int *)x; // bogus pointer!

History Von Neumann Model Von Neumann Model Fetch-Decode-Execute Cycle Memory Unit Speeding Things Up Central Processing Unit Conclusion Indirection

An address is just information In the previous slide we’ve done indirection The content at a memory location is the address of another memory location: we call this a pointer/reference At that other memory location is some content that we care about which in our case is the value ’9’ but which could be yet another address

Henri Casanova ([email protected]) Computer Architecture Overview History Von Neumann Model Von Neumann Model Fetch-Decode-Execute Cycle Memory Unit Speeding Things Up Central Processing Unit Conclusion Indirection

An address is just information In the previous slide we’ve done indirection The content at a memory location is the address of another memory location: we call this a pointer/reference At that other memory location is some content that we care about which in our case is the value ’9’ but which could be yet another address It’s the job of the programmer to know what memory content means (the CPU has no idea), which is a source of bugs Very well-known difficulty when writing assembly (ICS312/ICS331) High-level programming languages help, but in C you can do whatever: e.g., on a 64-bit architecture a C pointer is simply an unsigned long unsigned long x = 42; int *ptr = (int *)x; // bogus pointer!

Henri Casanova ([email protected]) Computer Architecture Overview or in assembly (pseudo-)instructions:

// MIPS-like (ICS 331) // -like (ICS 312) S1: LOAD A, (1000 0000) S1: MOV AL, [1000 0000] S2: LOAD B, (1000 0001) S2: MOV BL, [1000 0001] S3: ADD A, B S3: ADD AL, BL S4: STORE A, (1000 0010) S4: MOV [1000 0010], AL S5: JMP S1 S5: JMP S1

History Von Neumann Model Von Neumann Model Fetch-Decode-Execute Cycle Memory Unit Speeding Things Up Central Processing Unit Conclusion Hello World! (Well... not really)

Let’s consider the following pseudo-code:

Step 1) Set the content of variable A to the content at address 1000 0000 Step 2) Set the content of variable B to the content at address 1000 0001 Step 3) Add A and B together and store the result in A Step 4) Set the content at address 1000 0001 to the contents of A Step 5) Go back to Step 1

Henri Casanova ([email protected]) Computer Architecture Overview History Von Neumann Model Von Neumann Model Fetch-Decode-Execute Cycle Memory Unit Speeding Things Up Central Processing Unit Conclusion Hello World! (Well... not really)

Let’s consider the following pseudo-code:

Step 1) Set the content of variable A to the content at address 1000 0000 Step 2) Set the content of variable B to the content at address 1000 0001 Step 3) Add A and B together and store the result in A Step 4) Set the content at address 1000 0001 to the contents of A Step 5) Go back to Step 1

or in assembly (pseudo-)instructions:

// MIPS-like (ICS 331) // x86-like (ICS 312) S1: LOAD A, (1000 0000) S1: MOV AL, [1000 0000] S2: LOAD B, (1000 0001) S2: MOV BL, [1000 0001] S3: ADD A, B S3: ADD AL, BL S4: STORE A, (1000 0010) S4: MOV [1000 0010], AL S5: JMP S1 S5: JMP S1

Henri Casanova ([email protected]) Computer Architecture Overview Here are some x86 instruction encodings:

Instruction Encoding (in hex) Size ADD EAX, 1 83C001 3 bytes ADD EAX, -1 83C0FF 3 bytes ADD EAX, -100000 056079FEFF 5 bytes ADD EAX, EBX 01D8 2 bytes

Some instructions are shorter than others, which impacts the size of the executable An assembler transforms assembly code into binary code, so programmers typically don’t know the binary code for instructions

History Von Neumann Model Von Neumann Model Fetch-Decode-Execute Cycle Memory Unit Speeding Things Up Central Processing Unit Conclusion Binary Instruction Encoding

Instructions are encoded in binary, based on the specification of the your computer uses

Henri Casanova ([email protected]) Computer Architecture Overview History Von Neumann Model Von Neumann Model Fetch-Decode-Execute Cycle Memory Unit Speeding Things Up Central Processing Unit Conclusion Binary Instruction Encoding

Instructions are encoded in binary, based on the specification of the microprocessor your computer uses Here are some x86 instruction encodings:

Instruction Encoding (in hex) Size ADD EAX, 1 83C001 3 bytes ADD EAX, -1 83C0FF 3 bytes ADD EAX, -100000 056079FEFF 5 bytes ADD EAX, EBX 01D8 2 bytes

Some instructions are shorter than others, which impacts the size of the executable An assembler transforms assembly code into binary code, so programmers typically don’t know the binary code for instructions

Henri Casanova ([email protected]) Computer Architecture Overview This is conveniently hidden from the programmer, unless you write assembly It’s the CPU job to understand that 83C0D1 means ADD EAX, 1

along with data Once a program is loaded in memory its address space contains both code and data The CPU can’t tell the difference, only the programmer can

1000 0000 05 Some data 1000 0001 4F Some data 1000 0010 2C Some data 1000 0011 00 Some data

History Von Neumann Model Von Neumann Model Fetch-Decode-Execute Cycle Memory Unit Speeding Things Up Central Processing Unit Conclusion

The program is stored in RAM

Address Content (hex) Meaning 0000 0000 83 ADD EAX, 1 0000 0001 C0 0000 0010 01 0000 0011 01 ADD EAX, EBX 0000 0100 D8 0000 0101 05 0000 0110 60 ADD EAX, -100000 0000 0111 79 0000 1000 FE 0000 1000 FF ......

Henri Casanova ([email protected]) Computer Architecture Overview It’s the CPU job to understand that 83C0D1 means ADD EAX, 1

Once a program is loaded in memory its address space contains both code and data The CPU can’t tell the difference, only the programmer can This is conveniently hidden from the programmer, unless you write assembly

History Von Neumann Model Von Neumann Model Fetch-Decode-Execute Cycle Memory Unit Speeding Things Up Central Processing Unit Conclusion

The program is stored in RAM along with data

Address Content (hex) Meaning 0000 0000 83 ADD EAX, 1 0000 0001 C0 0000 0010 01 0000 0011 01 ADD EAX, EBX 0000 0100 D8 0000 0101 05 0000 0110 60 ADD EAX, -100000 0000 0111 79 0000 1000 FE 0000 1000 FF ...... 1000 0000 05 Some data 1000 0001 4F Some data 1000 0010 2C Some data 1000 0011 00 Some data

Henri Casanova ([email protected]) Computer Architecture Overview History Von Neumann Model Von Neumann Model Fetch-Decode-Execute Cycle Memory Unit Speeding Things Up Central Processing Unit Conclusion

The program is stored in RAM along with data Once a program is loaded in Address Content (hex) Meaning memory its address space 0000 0000 83 ADD EAX, 1 contains both code and data 0000 0001 C0 0000 0010 01 The CPU can’t tell the 0000 0011 01 ADD EAX, EBX difference, only the 0000 0100 D8 0000 0101 05 programmer can 0000 0110 60 ADD EAX, -100000 0000 0111 79 This is conveniently hidden 0000 1000 FE from the programmer, unless 0000 1000 FF you write assembly ...... It’s the CPU job to 1000 0000 05 Some data 1000 0001 4F Some data understand that 83C0D1 1000 0010 2C Some data means ADD EAX, 1 1000 0011 00 Some data

Henri Casanova ([email protected]) Computer Architecture Overview History Von Neumann Model Von Neumann Model Fetch-Decode-Execute Cycle Memory Unit Speeding Things Up Central Processing Unit Conclusion Memory Unit: Conclusions

The memory is basically an indexed array of bytes The memory contents have various useful meaning: integers, character codes, floating-point numbers, ... but also higher level abstractions: RGB values, coordinates in space-time, images... addresses (pointers) instructions (i.e., executable code) understood by a CPU

Henri Casanova ([email protected]) Computer Architecture Overview History Von Neumann Model Von Neumann Model Fetch-Decode-Execute Cycle Memory Unit Speeding Things Up Central Processing Unit Conclusion The Von Neumann Architecture

CPU ⇐⇒ Memory m I/O

Henri Casanova ([email protected]) Computer Architecture Overview The component that performs the computational operations is called the ALU (Arithmetic and Logic Unit) It can perform what you expect (+, -, /, *, OR, AND, XOR, ...) Operands and results of operations must all be in registers Unfortunately, there are very few registers e.g., -i7 8 × 32-bit; 16 × 64-bit; (and 16 FP 128- or 256-bit) This is a pain when writing assembly by hand But the compiler does all that work for us when we use high-level languages

History Von Neumann Model Von Neumann Model Fetch-Decode-Execute Cycle Memory Unit Speeding Things Up Central Processing Unit Conclusion Central Processing Unit

The CPU reads data from memory into registers, writes data from registers to memory, and computes

Henri Casanova ([email protected]) Computer Architecture Overview Operands and results of operations must all be in registers Unfortunately, there are very few registers e.g., Intel-i7 8 × 32-bit; 16 × 64-bit; (and 16 FP 128- or 256-bit) This is a pain when writing assembly by hand But the compiler does all that work for us when we use high-level languages

History Von Neumann Model Von Neumann Model Fetch-Decode-Execute Cycle Memory Unit Speeding Things Up Central Processing Unit Conclusion Central Processing Unit

The CPU reads data from memory into registers, writes data from registers to memory, and computes The component that performs the computational operations is called the ALU (Arithmetic and Logic Unit) It can perform what you expect (+, -, /, *, OR, AND, XOR, ...)

Henri Casanova ([email protected]) Computer Architecture Overview Unfortunately, there are very few registers e.g., Intel-i7 8 × 32-bit; 16 × 64-bit; (and 16 FP 128- or 256-bit) This is a pain when writing assembly by hand But the compiler does all that work for us when we use high-level languages

History Von Neumann Model Von Neumann Model Fetch-Decode-Execute Cycle Memory Unit Speeding Things Up Central Processing Unit Conclusion Central Processing Unit

The CPU reads data from memory into registers, writes data from registers to memory, and computes The component that performs the computational operations is called the ALU (Arithmetic and Logic Unit) It can perform what you expect (+, -, /, *, OR, AND, XOR, ...) Operands and results of operations must all be in registers

Henri Casanova ([email protected]) Computer Architecture Overview History Von Neumann Model Von Neumann Model Fetch-Decode-Execute Cycle Memory Unit Speeding Things Up Central Processing Unit Conclusion Central Processing Unit

The CPU reads data from memory into registers, writes data from registers to memory, and computes The component that performs the computational operations is called the ALU (Arithmetic and Logic Unit) It can perform what you expect (+, -, /, *, OR, AND, XOR, ...) Operands and results of operations must all be in registers Unfortunately, there are very few registers e.g., Intel-i7 8 × 32-bit; 16 × 64-bit; (and 16 FP 128- or 256-bit) This is a pain when writing assembly by hand But the compiler does all that work for us when we use high-level languages

Henri Casanova ([email protected]) Computer Architecture Overview History Von Neumann Model Von Neumann Model Fetch-Decode-Execute Cycle Memory Unit Speeding Things Up Central Processing Unit Conclusion Central Processing Unit

The CPU also controls the of the program’s instructions The is the component in charge of controlling the program execution, and it uses dedicated registers: Program : Contains the address of the next instruction that should be executed: is incremented after each instruction but can be set to whatever address when there is a change in control flow Current Instruction: The binary code of the instruction which is currently being executed Other registers: Stack Pointer, Frame Pointer, ... The Control Unit decodes the instructions (i.e., interprets their bits) and makes them happen This is a main topic of a Computer Architecture course

Henri Casanova ([email protected]) Computer Architecture Overview The Control Unit fetches the next program instruction from memory using the program counter The instruction is decoded and signals are sent to hardware components (, ALU, I/O controller) The instruction is executed: Values are fetched from memory and put in the registers Computation is performed by the ALU and results are stored in registers Register values are pushed back to memory Program state is modified (Program Counter, Stack Pointer, ...) Repeat Computers implement many variations on this cycle, with tons of bells and whistles to make it as fast as possible But one can still program with the above model in mind (but certainly without fully understanding performance issues)

Fetch-Decode-Execute Cycle History Initialization Von Neumann Model Fetch Fetch-Decode-Execute Cycle Decode Speeding Things Up Execute Conclusion Repeat... I/O Fetch-Decode-Execute Cycle

The Fetch-Decode-Execute Cycle

Henri Casanova ([email protected]) Computer Architecture Overview The instruction is decoded and signals are sent to hardware components (memory controller, ALU, I/O controller) The instruction is executed: Values are fetched from memory and put in the registers Computation is performed by the ALU and results are stored in registers Register values are pushed back to memory Program state is modified (Program Counter, Stack Pointer, ...) Repeat Computers implement many variations on this cycle, with tons of bells and whistles to make it as fast as possible But one can still program with the above model in mind (but certainly without fully understanding performance issues)

Fetch-Decode-Execute Cycle History Initialization Von Neumann Model Fetch Fetch-Decode-Execute Cycle Decode Speeding Things Up Execute Conclusion Repeat... I/O Fetch-Decode-Execute Cycle

The Fetch-Decode-Execute Cycle The Control Unit fetches the next program instruction from memory using the program counter

Henri Casanova ([email protected]) Computer Architecture Overview The instruction is executed: Values are fetched from memory and put in the registers Computation is performed by the ALU and results are stored in registers Register values are pushed back to memory Program state is modified (Program Counter, Stack Pointer, ...) Repeat Computers implement many variations on this cycle, with tons of bells and whistles to make it as fast as possible But one can still program with the above model in mind (but certainly without fully understanding performance issues)

Fetch-Decode-Execute Cycle History Initialization Von Neumann Model Fetch Fetch-Decode-Execute Cycle Decode Speeding Things Up Execute Conclusion Repeat... I/O Fetch-Decode-Execute Cycle

The Fetch-Decode-Execute Cycle The Control Unit fetches the next program instruction from memory using the program counter The instruction is decoded and signals are sent to hardware components (memory controller, ALU, I/O controller)

Henri Casanova ([email protected]) Computer Architecture Overview Repeat Computers implement many variations on this cycle, with tons of bells and whistles to make it as fast as possible But one can still program with the above model in mind (but certainly without fully understanding performance issues)

Fetch-Decode-Execute Cycle History Initialization Von Neumann Model Fetch Fetch-Decode-Execute Cycle Decode Speeding Things Up Execute Conclusion Repeat... I/O Fetch-Decode-Execute Cycle

The Fetch-Decode-Execute Cycle The Control Unit fetches the next program instruction from memory using the program counter The instruction is decoded and signals are sent to hardware components (memory controller, ALU, I/O controller) The instruction is executed: Values are fetched from memory and put in the registers Computation is performed by the ALU and results are stored in registers Register values are pushed back to memory Program state is modified (Program Counter, Stack Pointer, ...)

Henri Casanova ([email protected]) Computer Architecture Overview Computers implement many variations on this cycle, with tons of bells and whistles to make it as fast as possible But one can still program with the above model in mind (but certainly without fully understanding performance issues)

Fetch-Decode-Execute Cycle History Initialization Von Neumann Model Fetch Fetch-Decode-Execute Cycle Decode Speeding Things Up Execute Conclusion Repeat... I/O Fetch-Decode-Execute Cycle

The Fetch-Decode-Execute Cycle The Control Unit fetches the next program instruction from memory using the program counter The instruction is decoded and signals are sent to hardware components (memory controller, ALU, I/O controller) The instruction is executed: Values are fetched from memory and put in the registers Computation is performed by the ALU and results are stored in registers Register values are pushed back to memory Program state is modified (Program Counter, Stack Pointer, ...) Repeat

Henri Casanova ([email protected]) Computer Architecture Overview Fetch-Decode-Execute Cycle History Initialization Von Neumann Model Fetch Fetch-Decode-Execute Cycle Decode Speeding Things Up Execute Conclusion Repeat... I/O Fetch-Decode-Execute Cycle

The Fetch-Decode-Execute Cycle The Control Unit fetches the next program instruction from memory using the program counter The instruction is decoded and signals are sent to hardware components (memory controller, ALU, I/O controller) The instruction is executed: Values are fetched from memory and put in the registers Computation is performed by the ALU and results are stored in registers Register values are pushed back to memory Program state is modified (Program Counter, Stack Pointer, ...) Repeat Computers implement many variations on this cycle, with tons of bells and whistles to make it as fast as possible But one can still program with the above model in mind (but certainly without fully understanding performance issues)

Henri Casanova ([email protected]) Computer Architecture Overview Fetch-Decode-Execute Cycle History Initialization Von Neumann Model Fetch Fetch-Decode-Execute Cycle Decode Speeding Things Up Execute Conclusion Repeat... I/O Fetch-Decode-Execute

Let’s consider a simplistic hypothetical Von Neumann architecture Memory contains 256 × 1 byte CPU has 2 “data” registers (A and B), 2 “control” registers (Program Counter and Current Instruction) CPU instructions encoded on 1 byte (8 bits): 3-bit “” (operation code) and 5-bit operands: Opcode 000: Load to register A from memory Opcode 001: Load to register B from memory Opcode 010: Add B to A; store the result in A Opcode 011: Store the value of A to memory Opcode 100: Jump Opcode 111: Halt (program terminates) We will assume that initially A = 5 and B = 151

Henri Casanova ([email protected]) Computer Architecture Overview Fetch-Decode-Execute Cycle History Initialization Von Neumann Model Fetch Fetch-Decode-Execute Cycle Decode Speeding Things Up Execute Conclusion Repeat... I/O Sample Execution Decoding

From the previous slide, our instructions are as follows: Opcode 000: Load to register A from memory Opcode 001: Load to register B from memory Opcode 010: Add B to A; store the result in A Opcode 011: Store the value of A to memory Opcode 100: Jump Opcode 111: Halt (program terminates) So, for instance, here are meanings of example instructions: 00010111: Load the byte in RAM at address 00010111 into register A (“LOAD A, (10111)” in MIPS-like assembly) 010?????: A = A + B (we don’t care what the 5 trailing bits are because this instruction takes no operand) 10000011: Jump to the instruction at address 00000011 and execute it

Henri Casanova ([email protected]) Computer Architecture Overview Fetch-Decode-Execute Cycle History Initialization Von Neumann Model Fetch Fetch-Decode-Execute Cycle Decode Speeding Things Up Execute Conclusion Repeat... I/O (Initialization)-Fetch-Decode-Execute

CPU Memory

Address Content Meaning A undefined PC undefined 0000 0100 000 10000 LOAD A, (10000) Registers CU Registers 0000 0101 001 10001 LOAD B, (10001) B undefined CI undefined 0000 0110 010 00000 ADD A, B 0000 0111 011 10001 STORE A, (10001) 0000 1000 100 00100 JMP (0 0100) ...... 0001 0000 0000 0101 5d ALU Control Unit 0001 0001 1001 0111 151d ......

The Program (its Code and its Data) is loaded into memory (Guess who does that?)

Henri Casanova ([email protected]) Computer Architecture Overview Fetch-Decode-Execute Cycle History Initialization Von Neumann Model Fetch Fetch-Decode-Execute Cycle Decode Speeding Things Up Execute Conclusion Repeat... I/O (Initialization)-Fetch-Decode-Execute

CPU Memory

Address Content Meaning A undefined PC 0000 0100 0000 0100 000 10000 LOAD A, (10000) Registers CU Registers 0000 0101 001 10001 LOAD B, (10001) B undefined CI undefined 0000 0110 010 00000 ADD A, B 0000 0111 011 10001 STORE A, (10001) 0000 1000 100 00100 JMP (0 0100) ...... 0001 0000 0000 0101 5d ALU Control Unit 0001 0001 1001 0111 151d ......

The Program Counter is set to the address of the first instruction of the program (Guess who does that?)

Henri Casanova ([email protected]) Computer Architecture Overview Fetch-Decode-Execute Cycle History Initialization Von Neumann Model Fetch Fetch-Decode-Execute Cycle Decode Speeding Things Up Execute Conclusion Repeat... I/O Fetch-Decode-Execute

CPU Memory

Address Content Meaning A undefined PC 0000 0100 0000 0100 000 10000 LOAD A, (10000) Registers CU Registers 0000 0101 001 10001 LOAD B, (10001) B undefined CI undefined 0000 0110 010 00000 ADD A, B 0000 0111 011 10001 STORE A, (10001) 0000 1000 100 00100 JMP (0 0100) ...... 0001 0000 000 00101 5d ALU Control Unit 0001 0001 100 10111 151d ......

A request is put on the Address Bus to retrieve the value in memory at address PC = 0000 0100

Henri Casanova ([email protected]) Computer Architecture Overview Fetch-Decode-Execute Cycle History Initialization Von Neumann Model Fetch Fetch-Decode-Execute Cycle Decode Speeding Things Up Execute Conclusion Repeat... I/O Fetch-Decode-Execute

CPU Memory

Address Content Meaning A undefined PC 0000 0100 0000 0100 000 10000 LOAD A, (10000) Registers CU Registers 0000 0101 001 10001 LOAD B, (10001) B undefined CI 0001 0000 0000 0110 010 00000 ADD A, B 0000 0111 011 10001 STORE A, (10001) 0000 1000 100 00100 JMP (0 0100) ...... 0001 0000 0000 0101 5d ALU Control Unit 0001 0001 1001 0111 151d ......

The Memory Unit puts the requested data on the Data Bus and the CPU puts it into the CI register

Henri Casanova ([email protected]) Computer Architecture Overview Fetch-Decode-Execute Cycle History Initialization Von Neumann Model Fetch Fetch-Decode-Execute Cycle Decode Speeding Things Up Execute Conclusion Repeat... I/O Fetch-Decode-Execute

CPU Memory

Address Data Meaning A undefined PC 0000 0101 0000 0100 000 10000 LOAD A, (10000) Registers CU Registers 0000 0101 001 10001 LOAD B, (10001) B undefined CI 0001 0000 0000 0110 010 00000 ADD A, B 0000 0111 011 10001 STORE A, (10001) 0000 1000 100 00100 JMP (0 0100) ...... 0001 0000 0000 0101 5d ALU Control Unit 0001 0001 1001 0111 151d ......

The PC register value is incremented: its new value is the address of the next instruction to execute

Henri Casanova ([email protected]) Computer Architecture Overview Fetch-Decode-Execute Cycle History Initialization Von Neumann Model Fetch Fetch-Decode-Execute Cycle Decode Speeding Things Up Execute Conclusion Repeat... I/O Fetch-Decode-Execute

CPU Memory

Address Data Meaning A undefined PC 0000 0101 0000 0100 000 10000 LOAD A, (10000) Registers CU Registers 0000 0101 001 10001 LOAD B, (10001) B undefined CI 00010000 0000 0110 010 00000 ADD A, B 0000 0111 011 10001 STORE A, (10001) 0000 1000 100 00100 JMP (0 0100) ...... 0001 0000 0000 0101 5d ALU Control Unit 0001 0001 1001 0111 151d ......

The instruction is decoded: 00010000 means “000 = LOAD A from address 000(10000)”

Henri Casanova ([email protected]) Computer Architecture Overview Fetch-Decode-Execute Cycle History Initialization Von Neumann Model Fetch Fetch-Decode-Execute Cycle Decode Speeding Things Up Execute Conclusion Repeat... I/O Fetch-Decode-Execute

CPU Memory

Address Contents Meaning A undefined PC 0000 0101 0000 0100 000 10000 LOAD A, (10000) Registers CU Registers 0000 0101 001 10001 LOAD B, (10001) B undefined CI 00010000 0000 0110 010 00000 ADD A, B 0000 0111 011 10001 STORE A, (10001) 0000 1000 100 00100 JMP (0 0100) ...... 0001 0000 0000 0101 5d ALU Control Unit 0001 0001 1001 0111 151d ......

The instruction is executed: The value of the memory at address 00010000 is requested (using the address bus)

Henri Casanova ([email protected]) Computer Architecture Overview Fetch-Decode-Execute Cycle History Initialization Von Neumann Model Fetch Fetch-Decode-Execute Cycle Decode Speeding Things Up Execute Conclusion Repeat... I/O Fetch-Decode-Execute

CPU Memory

Address Contents Meaning A 0000 0101 PC 0000 0101 0000 0100 000 10000 LOAD A, (10000) Registers CU Registers 0000 0101 001 10001 LOAD B, (10001) B undefined CI 0001 0000 0000 0110 010 00000 ADD A, B 0000 0111 011 10001 STORE A, (10001) 0000 1000 100 00100 JMP (0 0100) ...... 0001 0000 0000 0101 5d ALU Control Unit 0001 0001 1001 0111 151d ......

The instruction is executed: The content at address 10000, that is 0000 0101 is put on the Data Bus and written to register A

Henri Casanova ([email protected]) Computer Architecture Overview Fetch-Decode-Execute Cycle History Initialization Von Neumann Model Fetch Fetch-Decode-Execute Cycle Decode Speeding Things Up Execute Conclusion Repeat... I/O Fetch-Decode-Execute-(Repeat)

Repeat!

Henri Casanova ([email protected]) Computer Architecture Overview Fetch-Decode-Execute Cycle History Initialization Von Neumann Model Fetch Fetch-Decode-Execute Cycle Decode Speeding Things Up Execute Conclusion Repeat... I/O Fetch-Decode-Execute

CPU Memory

Address Contents Meaning A 0000 0101 PC 0000 0110 0000 0100 000 10000 LOAD A, (10000) Registers CU Registers 000 00101 001 10001 LOAD B, (10001) B undefined CI 0011 0001 0000 0110 010 00000 ADD A, B 0000 0111 011 10001 STORE A, (10001) 0000 1000 100 00100 JMP (0 0100) ...... 0001 0000 0000 0101 5d ALU Control Unit 0001 0001 1001 0111 151d ......

Fetch (Note that the value of PC is incremented)

Henri Casanova ([email protected]) Computer Architecture Overview Fetch-Decode-Execute Cycle History Initialization Von Neumann Model Fetch Fetch-Decode-Execute Cycle Decode Speeding Things Up Execute Conclusion Repeat... I/O Fetch-Decode-Execute

CPU Memory

Address Contents Meaning A 0000 0101 PC 0000 0110 0000 0100 000 10000 LOAD A, (10000) Registers CU Registers 0000 0101 001 10001 LOAD B, (10001) B undefined CI 001 10001 0000 0110 010 00000 ADD A, B 0000 0111 011 10001 STORE A, (10001) 0000 1000 100 00100 JMP (0 0100) ...... 0001 0000 0000 0101 5d ALU Control Unit 0001 0001 1001 0111 151d ......

The instruction is decoded: 00110001 means “001 = LOAD B from address 000(10001)”

Henri Casanova ([email protected]) Computer Architecture Overview Fetch-Decode-Execute Cycle History Initialization Von Neumann Model Fetch Fetch-Decode-Execute Cycle Decode Speeding Things Up Execute Conclusion Repeat... I/O Fetch-Decode-Execute

CPU Memory

Address Contents Meaning A 0000 0101 PC 0000 0110 0000 0100 000 10000 LOAD A, (10000) Registers CU Registers 0000 0101 001 10001 LOAD B, (10001) B 1001 0111 CI 0011 0001 0000 0110 010 00000 ADD A, B 0000 0111 011 10001 STORE A, (10001) 0000 1000 100 00100 JMP (0 0100) ...... 0001 0000 0000 0101 5d ALU Control Unit 0001 0001 1001 0111 151d ......

The instruction is executed: Value read at address 00010001, that is, 1001 0111 is written to register B

Henri Casanova ([email protected]) Computer Architecture Overview Fetch-Decode-Execute Cycle History Initialization Von Neumann Model Fetch Fetch-Decode-Execute Cycle Decode Speeding Things Up Execute Conclusion Repeat... I/O Fetch-Decode-Execute

CPU Memory

Address Contents Meaning A 0000 0101 PC 0000 0111 0000 0100 000 10000 LOAD A, (10000) Registers CU Registers 0000 0101 001 10001 LOAD B, (10001) B 1001 0111 CI 0100 0000 0000 0110 010 00000 ADD A, B 0000 0111 011 10001 STORE A, (10001) 0000 1000 100 00100 JMP (0 0100) ...... 0001 0000 0000 0101 5d ALU Control Unit 0001 0001 1001 0111 151d ......

Fetch

Henri Casanova ([email protected]) Computer Architecture Overview Fetch-Decode-Execute Cycle History Initialization Von Neumann Model Fetch Fetch-Decode-Execute Cycle Decode Speeding Things Up Execute Conclusion Repeat... I/O Fetch-Decode-Execute

CPU Memory

Address Contents Meaning A 0000 0101 PC 0000 0111 0000 0100 000 10000 LOAD A, (10000) Registers CU Registers 0000 0101 001 10001 LOAD B, (10001) B 1001 0111 CI 01000000 0000 0110 010 00000 ADD A, B 0000 0111 011 10001 STORE A, (10001) 0000 1000 100 00100 JMP (0 0100) ...... 0001 0000 0000 0101 5d ALU Control Unit 0001 0001 1001 0111 151d ......

The instruction is decoded: 01000000 means “010 = ADD A, B (the operand is ignored)”

Henri Casanova ([email protected]) Computer Architecture Overview Fetch-Decode-Execute Cycle History Initialization Von Neumann Model Fetch Fetch-Decode-Execute Cycle Decode Speeding Things Up Execute Conclusion Repeat... I/O Fetch-Decode-Execute

CPU Memory

Address Contents Meaning A 1001 1100 PC 0000 0111 0000 0100 000 10000 LOAD A, (10000) Registers CU Registers 0000 0101 001 10001 LOAD B, (10001) B 1001 0111 CI 0100 0000 0000 0110 010 00000 ADD A, B 0000 0111 011 10001 STORE A, (10001) 0000 1000 100 00100 JMP (0 0100) ...... 0001 0000 0000 0101 5d ALU Control Unit 0001 0001 1001 0111 151d ......

The instruction is executed (A ← A+B)

Henri Casanova ([email protected]) Computer Architecture Overview Fetch-Decode-Execute Cycle History Initialization Von Neumann Model Fetch Fetch-Decode-Execute Cycle Decode Speeding Things Up Execute Conclusion Repeat... I/O Fetch-Decode-Execute

CPU Memory

Address Contents Meaning A 1001 1100 PC 0000 1000 0000 0100 000 10000 LOAD A, (10000) Registers CU Registers 0000 0101 001 10001 LOAD B, (10001) B 1001 0111 CI 0111 0001 0000 0110 010 00000 ADD A, B 0000 0111 011 10001 STORE A, (10001) 0000 1000 100 00100 JMP (0 0100) ...... 0001 0000 0000 0101 5d ALU Control Unit 0001 0001 1001 0111 151d ......

Fetch

Henri Casanova ([email protected]) Computer Architecture Overview Fetch-Decode-Execute Cycle History Initialization Von Neumann Model Fetch Fetch-Decode-Execute Cycle Decode Speeding Things Up Execute Conclusion Repeat... I/O Fetch-Decode-Execute

CPU Memory

Address Contents Meaning A 1001 1100 PC 0000 1000 0000 0100 000 10000 LOAD A, (10000) Registers CU Registers 0000 0101 001 10001 LOAD B, (10001) B 1001 0111 CI 0111 0001 0000 0110 010 00000 ADD A, B 0000 0111 011 10001 STORE A, (10001) 0000 1000 100 00100 JMP (0 0100) ...... 0001 0000 0000 0101 5d ALU Control Unit 0001 0001 1001 1100 156d ......

(Let’s skip the Decode part) Execute

Henri Casanova ([email protected]) Computer Architecture Overview Fetch-Decode-Execute Cycle History Initialization Von Neumann Model Fetch Fetch-Decode-Execute Cycle Decode Speeding Things Up Execute Conclusion Repeat... I/O Fetch-Decode-Execute

CPU Memory

Address Content Meaning A 1001 1100 PC 0000 1001 0000 0100 000 10000 LOAD A, (10000) Registers CU Registers 0000 0101 001 10001 LOAD B, (10001) B 1001 0111 CI 1000 0100 0000 0110 010 00000 ADD A, B 0000 0111 011 10001 STORE A, (10001) 0000 1000 100 00100 JMP (00 0100) ...... 0001 0000 0000 0101 5d ALU Control Unit 0001 0001 1001 0111 156d ......

Fetch

Henri Casanova ([email protected]) Computer Architecture Overview The next instruction to execute will be LOAD A, (10000) And like that we have implemented an infinite loop...

Fetch-Decode-Execute Cycle History Initialization Von Neumann Model Fetch Fetch-Decode-Execute Cycle Decode Speeding Things Up Execute Conclusion Repeat... I/O Fetch-Decode-Execute

CPU Memory

Address Content Meaning A 1001 1010 PC 0000 0100 0000 0100 000 10000 LOAD A, (10000) Registers CU Registers 0000 0101 001 10001 LOAD B, (10001) B 1001 0111 CI 1000 0100 0000 0110 010 00000 ADD A, B 0000 0111 011 10001 STORE A, (10001) 0000 1000 100 00100 JMP (0 0100) ...... 0001 0000 0000 0101 5d ALU Control Unit 0001 0001 1001 1010 d154 ......

Execute - the JMP instruction modifies the value of a control register (PC)

Henri Casanova ([email protected]) Computer Architecture Overview And like that we have implemented an infinite loop...

Fetch-Decode-Execute Cycle History Initialization Von Neumann Model Fetch Fetch-Decode-Execute Cycle Decode Speeding Things Up Execute Conclusion Repeat... I/O Fetch-Decode-Execute

CPU Memory

Address Content Meaning A 1001 1010 PC 0000 0100 0000 0100 000 10000 LOAD A, (10000) Registers CU Registers 0000 0101 001 10001 LOAD B, (10001) B 1001 0111 CI 1000 0100 0000 0110 010 00000 ADD A, B 0000 0111 011 10001 STORE A, (10001) 0000 1000 100 00100 JMP (0 0100) ...... 0001 0000 0000 0101 5d ALU Control Unit 0001 0001 1001 1010 d154 ......

Execute - the JMP instruction modifies the value of a control register (PC) The next instruction to execute will be LOAD A, (10000)

Henri Casanova ([email protected]) Computer Architecture Overview Fetch-Decode-Execute Cycle History Initialization Von Neumann Model Fetch Fetch-Decode-Execute Cycle Decode Speeding Things Up Execute Conclusion Repeat... I/O Fetch-Decode-Execute

CPU Memory

Address Content Meaning A 1001 1010 PC 0000 0100 0000 0100 000 10000 LOAD A, (10000) Registers CU Registers 0000 0101 001 10001 LOAD B, (10001) B 1001 0111 CI 1000 0100 0000 0110 010 00000 ADD A, B 0000 0111 011 10001 STORE A, (10001) 0000 1000 100 00100 JMP (0 0100) ...... 0001 0000 0000 0101 5d ALU Control Unit 0001 0001 1001 1010 d154 ......

Execute - the JMP instruction modifies the value of a control register (PC) The next instruction to execute will be LOAD A, (10000) And like that we have implemented an infinite loop...

Henri Casanova ([email protected]) Computer Architecture Overview Fetch-Decode-Execute Cycle History Initialization Von Neumann Model Fetch Fetch-Decode-Execute Cycle Decode Speeding Things Up Execute Conclusion Repeat... I/O Fetch-Decode-Execute Practice

It’s a pretty good idea to review these slides and see if you can go back to the first slide (initialization) and see if you can yourself go through the fetch-decode-execute cycle

We’ll have a simple homework assignment along these lines

But just in case, let’s do one together right now...

Henri Casanova ([email protected]) Computer Architecture Overview Fetch-Decode-Execute Cycle History Initialization Von Neumann Model Fetch Fetch-Decode-Execute Cycle Decode Speeding Things Up Execute Conclusion Repeat... I/O In-class activity (just to make sure we’re all on board)

CPU Memory

Address Content A undefined PC 0000 0001 0000 0000 001 10010 Registers CU Registers 0000 0001 000 10011

B 0000 0110 CI Undefined 0000 0010 010 00000 0000 0011 011 10111 0000 0100 001 10111 Opcode Meaning 5-bit operand 000 Load to register A from 0000 0101 111 00000 001 Load to register B from memory address ...... 010 Add B to A; store the result in A ignored 011 Store the value of A to memory address 0001 0010 0000 0110 100 Jump address 0001 0011 1000 0111 111 Halt ignored

What is the decimal value of register B when the program terminates?

Henri Casanova ([email protected]) Computer Architecture Overview Fetch-Decode-Execute Cycle History Initialization Von Neumann Model Fetch Fetch-Decode-Execute Cycle Decode Speeding Things Up Execute Conclusion Repeat... I/O In-class activity solution

CPU Memory

Address Content Meaning A 1000 1101 PC 0000 0110 0000 0000 001 10010 Registers CU Registers 0000 0001 000 10011 A ← 135d

B 1000 1101 CI 1110 000 0000 0010 010 00000 A ← 135d + 6d = 141d 0000 0011 011 10111 (00010111) ← 141d 0000 0100 001 10111 B ← (00010111) = 141d Opcode Meaning 5-bit operand 000 Load to register A from memory address 0000 0101 111 00000 Halt 001 Load to register B from memory address ...... 010 Add B to A; store the result in A ignored 011 Store the value of A to memory address 0001 0010 0000 0110 6d 100 Jump address 0001 0011 1000 0111 135d 111 Halt ignored

Answer: the decimal value of B is 141

Henri Casanova ([email protected]) Computer Architecture Overview Fetch-Decode-Execute Cycle History Initialization Von Neumann Model Fetch Fetch-Decode-Execute Cycle Decode Speeding Things Up Execute Conclusion Repeat... I/O There is more to Fetch-Decode-Execute

This was a simplified view of the way things work Control and data paths are implemented by several hardware components There is usually more than one ALU There are caches between the CPU and the memory There are even multiple CPUs The cycle is pipelined: Fetch the instruction i + 1 while instruction i is being executed Decades of computer architecture research have gone into improving speed, thus often leading to high hardware complexity (and doing smart things in hardware requires more logic gates and wires, thus increasing CPU cost) But, conceptually, it is still Fetch-Decode-Execute.

Henri Casanova ([email protected]) Computer Architecture Overview Fetch-Decode-Execute Cycle History Initialization Von Neumann Model Fetch Fetch-Decode-Execute Cycle Decode Speeding Things Up Execute Conclusion Repeat... I/O The Von Neumann Architecture

CPU ⇐⇒ Memory m I/O

Henri Casanova ([email protected]) Computer Architecture Overview Fetch-Decode-Execute Cycle History Initialization Von Neumann Model Fetch Fetch-Decode-Execute Cycle Decode Speeding Things Up Execute Conclusion Repeat... I/O I/O

Let’s leave this topic for (much) later...

Let’s just assume that there is an I/O Controller and that the CPU can talk to it to make I/O happen (reads and writes) After all there is a Memory Controller and at the conceptual level they are not so different

Henri Casanova ([email protected]) Computer Architecture Overview Accessing a register is very fast e.g., a 4GHz CPU can update a register in 0.25 nanosecond (1 cycle) Accessing the memory takes about 10 ns The memory is ∼40 times slower than the CPU What does the CPU do while it’s waiting for the memory to give it data? NOTHING!! (yes, this is a problem) This is the famous “Von-Neumann Bottleneck”

Many techniques have been develop to address this

History Caching Von Neumann Model Locality Fetch-Decode-Execute Cycle Direct Memory Access Speeding Things Up Current Architectures Conclusion The RAM is slow

A big speed issue: the memory is slow

Henri Casanova ([email protected]) Computer Architecture Overview Accessing the memory takes about 10 ns The memory is ∼40 times slower than the CPU What does the CPU do while it’s waiting for the memory to give it data? NOTHING!! (yes, this is a problem) This is the famous “Von-Neumann Bottleneck”

Many techniques have been develop to address this

History Caching Von Neumann Model Locality Fetch-Decode-Execute Cycle Direct Memory Access Speeding Things Up Current Architectures Conclusion The RAM is slow

A big speed issue: the memory is slow Accessing a register is very fast e.g., a 4GHz CPU can update a register in 0.25 nanosecond (1 cycle)

Henri Casanova ([email protected]) Computer Architecture Overview What does the CPU do while it’s waiting for the memory to give it data? NOTHING!! (yes, this is a problem) This is the famous “Von-Neumann Bottleneck”

Many techniques have been develop to address this

History Caching Von Neumann Model Locality Fetch-Decode-Execute Cycle Direct Memory Access Speeding Things Up Current Architectures Conclusion The RAM is slow

A big speed issue: the memory is slow Accessing a register is very fast e.g., a 4GHz CPU can update a register in 0.25 nanosecond (1 cycle) Accessing the memory takes about 10 ns The memory is ∼40 times slower than the CPU

Henri Casanova ([email protected]) Computer Architecture Overview NOTHING!! (yes, this is a problem) This is the famous “Von-Neumann Bottleneck”

Many techniques have been develop to address this

History Caching Von Neumann Model Locality Fetch-Decode-Execute Cycle Direct Memory Access Speeding Things Up Current Architectures Conclusion The RAM is slow

A big speed issue: the memory is slow Accessing a register is very fast e.g., a 4GHz CPU can update a register in 0.25 nanosecond (1 cycle) Accessing the memory takes about 10 ns The memory is ∼40 times slower than the CPU What does the CPU do while it’s waiting for the memory to give it data?

Henri Casanova ([email protected]) Computer Architecture Overview Many techniques have been develop to address this

History Caching Von Neumann Model Locality Fetch-Decode-Execute Cycle Direct Memory Access Speeding Things Up Current Architectures Conclusion The RAM is slow

A big speed issue: the memory is slow Accessing a register is very fast e.g., a 4GHz CPU can update a register in 0.25 nanosecond (1 cycle) Accessing the memory takes about 10 ns The memory is ∼40 times slower than the CPU What does the CPU do while it’s waiting for the memory to give it data? NOTHING!! (yes, this is a problem) This is the famous “Von-Neumann Bottleneck”

Henri Casanova ([email protected]) Computer Architecture Overview History Caching Von Neumann Model Locality Fetch-Decode-Execute Cycle Direct Memory Access Speeding Things Up Current Architectures Conclusion The RAM is slow

A big speed issue: the memory is slow Accessing a register is very fast e.g., a 4GHz CPU can update a register in 0.25 nanosecond (1 cycle) Accessing the memory takes about 10 ns The memory is ∼40 times slower than the CPU What does the CPU do while it’s waiting for the memory to give it data? NOTHING!! (yes, this is a problem) This is the famous “Von-Neumann Bottleneck”

Many techniques have been develop to address this

Henri Casanova ([email protected]) Computer Architecture Overview Could we just build the memory just as gazillions of registers? No!!! Cost/physics make it impossible Instead, we play a trick to provide the illusion of a fast memory This trick is called the

History Caching Von Neumann Model Locality Fetch-Decode-Execute Cycle Direct Memory Access Speeding Things Up Current Architectures Conclusion Several levels of RAM

We would like a gigantic and fast memory

Henri Casanova ([email protected]) Computer Architecture Overview No!!! Cost/physics make it impossible Instead, we play a trick to provide the illusion of a fast memory This trick is called the memory hierarchy

History Caching Von Neumann Model Locality Fetch-Decode-Execute Cycle Direct Memory Access Speeding Things Up Current Architectures Conclusion Several levels of RAM

We would like a gigantic and fast memory Could we just build the memory just as gazillions of registers?

Henri Casanova ([email protected]) Computer Architecture Overview Instead, we play a trick to provide the illusion of a fast memory This trick is called the memory hierarchy

History Caching Von Neumann Model Locality Fetch-Decode-Execute Cycle Direct Memory Access Speeding Things Up Current Architectures Conclusion Several levels of RAM

We would like a gigantic and fast memory Could we just build the memory just as gazillions of registers? No!!! Cost/physics make it impossible

Henri Casanova ([email protected]) Computer Architecture Overview This trick is called the memory hierarchy

History Caching Von Neumann Model Locality Fetch-Decode-Execute Cycle Direct Memory Access Speeding Things Up Current Architectures Conclusion Several levels of RAM

We would like a gigantic and fast memory Could we just build the memory just as gazillions of registers? No!!! Cost/physics make it impossible Instead, we play a trick to provide the illusion of a fast memory

Henri Casanova ([email protected]) Computer Architecture Overview History Caching Von Neumann Model Locality Fetch-Decode-Execute Cycle Direct Memory Access Speeding Things Up Current Architectures Conclusion Several levels of RAM

We would like a gigantic and fast memory Could we just build the memory just as gazillions of registers? No!!! Cost/physics make it impossible Instead, we play a trick to provide the illusion of a fast memory This trick is called the memory hierarchy

Henri Casanova ([email protected]) Computer Architecture Overview I/O Bus I/O Devices

History Caching Von Neumann Model Locality Fetch-Decode-Execute Cycle Direct Memory Access Speeding Things Up Current Architectures Conclusion The Memory Hierarchy

fast slow small large

(CPU) Memory Bus Memory Registers

Henri Casanova ([email protected]) Computer Architecture Overview History Caching Von Neumann Model Locality Fetch-Decode-Execute Cycle Direct Memory Access Speeding Things Up Current Architectures Conclusion The Memory Hierarchy

fast slow small large

I/O (CPU) Memory Bus Bus I/O Memory Registers Devices

Henri Casanova ([email protected]) Computer Architecture Overview kB to MB TB Few 100s Bytes GB 1 ns 1+ ms < 1 ns 10 ns Hardware OS Compiler OS

History Caching Von Neumann Model Locality Fetch-Decode-Execute Cycle Direct Memory Access Speeding Things Up Current Architectures Conclusion The Memory Hierarchy

fast slow small large

C I/O a Memory Bus (CPU) Bus I/O c Memory Registers Devices h e

Henri Casanova ([email protected]) Computer Architecture Overview History Caching Von Neumann Model Locality Fetch-Decode-Execute Cycle Direct Memory Access Speeding Things Up Current Architectures Conclusion The Memory Hierarchy

fast slow small large

C I/O a Memory Bus (CPU) Bus I/O c Memory Registers Devices h e

kB to MB TB Few 100s Bytes GB 1 ns 1+ ms < 1 ns 10 ns Hardware OS Compiler OS

Henri Casanova ([email protected]) Computer Architecture Overview Leave the book at the library and go to the library each time you need one reference Take only the one book... but if it makes a reference to another book on the same topic you’ll have to go back to the library Or take the one book and the books around it and put them on your desk... and if THE reference makes a reference maybe you’ll have the referred book right there In this last option your desk is a “ for the library”

History Caching Von Neumann Model Locality Fetch-Decode-Execute Cycle Direct Memory Access Speeding Things Up Current Architectures Conclusion The Memory Hierarchy in a Nutshell

When a program accesses a byte in memory: It checks whether the byte is in cache, and if so, it just gets it Otherwise, the byte value is brought from the (slow) memory into the (fast) cache The values around the byte are also brought into the cache Analogy: To write a paper you need a reference book from the library You go to the library and find the book on a shelf, noticing that the books around it are on the same topic! You can...

Henri Casanova ([email protected]) Computer Architecture Overview Take only the one book... but if it makes a reference to another book on the same topic you’ll have to go back to the library Or take the one book and the books around it and put them on your desk... and if THE reference makes a reference maybe you’ll have the referred book right there In this last option your desk is a “cache for the library”

History Caching Von Neumann Model Locality Fetch-Decode-Execute Cycle Direct Memory Access Speeding Things Up Current Architectures Conclusion The Memory Hierarchy in a Nutshell

When a program accesses a byte in memory: It checks whether the byte is in cache, and if so, it just gets it Otherwise, the byte value is brought from the (slow) memory into the (fast) cache The values around the byte are also brought into the cache Analogy: To write a paper you need a reference book from the library You go to the library and find the book on a shelf, noticing that the books around it are on the same topic! You can... Leave the book at the library and go to the library each time you need one reference

Henri Casanova ([email protected]) Computer Architecture Overview Or take the one book and the books around it and put them on your desk... and if THE reference makes a reference maybe you’ll have the referred book right there In this last option your desk is a “cache for the library”

History Caching Von Neumann Model Locality Fetch-Decode-Execute Cycle Direct Memory Access Speeding Things Up Current Architectures Conclusion The Memory Hierarchy in a Nutshell

When a program accesses a byte in memory: It checks whether the byte is in cache, and if so, it just gets it Otherwise, the byte value is brought from the (slow) memory into the (fast) cache The values around the byte are also brought into the cache Analogy: To write a paper you need a reference book from the library You go to the library and find the book on a shelf, noticing that the books around it are on the same topic! You can... Leave the book at the library and go to the library each time you need one reference Take only the one book... but if it makes a reference to another book on the same topic you’ll have to go back to the library

Henri Casanova ([email protected]) Computer Architecture Overview History Caching Von Neumann Model Locality Fetch-Decode-Execute Cycle Direct Memory Access Speeding Things Up Current Architectures Conclusion The Memory Hierarchy in a Nutshell

When a program accesses a byte in memory: It checks whether the byte is in cache, and if so, it just gets it Otherwise, the byte value is brought from the (slow) memory into the (fast) cache The values around the byte are also brought into the cache Analogy: To write a paper you need a reference book from the library You go to the library and find the book on a shelf, noticing that the books around it are on the same topic! You can... Leave the book at the library and go to the library each time you need one reference Take only the one book... but if it makes a reference to another book on the same topic you’ll have to go back to the library Or take the one book and the books around it and put them on your desk... and if THE reference makes a reference maybe you’ll have the referred book right there In this last option your desk is a “cache for the library”

Henri Casanova ([email protected]) Computer Architecture Overview History Caching Von Neumann Model Locality Fetch-Decode-Execute Cycle Direct Memory Access Speeding Things Up Current Architectures Conclusion Why does it work?

TEMPORAL LOCALITY

A program tends to reference addresses it has already referenced e.g., Counters The first access is expensive: Fetching the value takes many cycles Each subsequent accesses are cheap: The value is in cache

The “I need that same book again” analogy

Henri Casanova ([email protected]) Computer Architecture Overview History Caching Von Neumann Model Locality Fetch-Decode-Execute Cycle Direct Memory Access Speeding Things Up Current Architectures Conclusion Why does it work?

SPATIAL LOCALITY

A program tends to reference addresses next to addresses it has already referenced e.g., When manipulating arrays (i.e., contiguous bytes in memory) The access to element i is expensive: Fetching the value takes many cycles Access to elements i + 1, i + 2, ... are cheap: The values are in cache!

The “I need a book on that same shelf” analogy

Henri Casanova ([email protected]) Computer Architecture Overview Cache Hit: When a data item is found in cache (e.g., we would talk of a “L2 cache hit”) Cache Miss: When a data item is not found in cache (e.g., we would talk of a “L1 cache hit”) We’ll use this hit/miss terminology for several OS concepts...

History Caching Von Neumann Model Locality Fetch-Decode-Execute Cycle Direct Memory Access Speeding Things Up Current Architectures Conclusion The Memory Hierarchy: Memory Caches

In reality there is more than one level of cache (L1, L2, L3) Trade-offs between size, speed, and cost L1 (the closest/fastest to the CPU) is actually split into Data Cache and Instructions Cache Chunks of data are brought from (far-away) memory and are copied and kept around in (nearby) caches The same data exist in multiple levels of memory at once, which leads to interesting issues/problems we might discuss (see ICS 432)

Henri Casanova ([email protected]) Computer Architecture Overview We’ll use this hit/miss terminology for several OS concepts...

History Caching Von Neumann Model Locality Fetch-Decode-Execute Cycle Direct Memory Access Speeding Things Up Current Architectures Conclusion The Memory Hierarchy: Memory Caches

In reality there is more than one level of cache (L1, L2, L3) Trade-offs between size, speed, and cost L1 (the closest/fastest to the CPU) is actually split into Data Cache and Instructions Cache Chunks of data are brought from (far-away) memory and are copied and kept around in (nearby) caches The same data exist in multiple levels of memory at once, which leads to interesting issues/problems we might discuss (see ICS 432) Cache Hit: When a data item is found in cache (e.g., we would talk of a “L2 cache hit”) Cache Miss: When a data item is not found in cache (e.g., we would talk of a “L1 cache hit”)

Henri Casanova ([email protected]) Computer Architecture Overview History Caching Von Neumann Model Locality Fetch-Decode-Execute Cycle Direct Memory Access Speeding Things Up Current Architectures Conclusion The Memory Hierarchy: Memory Caches

In reality there is more than one level of cache (L1, L2, L3) Trade-offs between size, speed, and cost L1 (the closest/fastest to the CPU) is actually split into Data Cache and Instructions Cache Chunks of data are brought from (far-away) memory and are copied and kept around in (nearby) caches The same data exist in multiple levels of memory at once, which leads to interesting issues/problems we might discuss (see ICS 432) Cache Hit: When a data item is found in cache (e.g., we would talk of a “L2 cache hit”) Cache Miss: When a data item is not found in cache (e.g., we would talk of a “L1 cache hit”) We’ll use this hit/miss terminology for several OS concepts...

Henri Casanova ([email protected]) Computer Architecture Overview This is called Direct Memory Access (DMA)

History Caching Von Neumann Model Locality Fetch-Decode-Execute Cycle Direct Memory Access Speeding Things Up Current Architectures Conclusion Direct Memory Access (DMA)

Often, one has to copy large chunks of data to/from RAM from/to some peripheral device (graphics card, network card, sound card, disk) In the pure Von-Neumann model, the CPU has to be involved for each copy operation The problem is the memory copies take a long time (even with caches), and the CPU spends its life twiddling its thumbs while the copies are taking place ause It would be better to have copies occur independently so that the CPU can do something useful while the memory copy is taking place

Henri Casanova ([email protected]) Computer Architecture Overview History Caching Von Neumann Model Locality Fetch-Decode-Execute Cycle Direct Memory Access Speeding Things Up Current Architectures Conclusion Direct Memory Access (DMA)

Often, one has to copy large chunks of data to/from RAM from/to some peripheral device (graphics card, network card, sound card, disk) In the pure Von-Neumann model, the CPU has to be involved for each copy operation The problem is the memory copies take a long time (even with caches), and the CPU spends its life twiddling its thumbs while the copies are taking place ause It would be better to have copies occur independently so that the CPU can do something useful while the memory copy is taking place This is called Direct Memory Access (DMA)

Henri Casanova ([email protected]) Computer Architecture Overview History Caching Von Neumann Model Locality Fetch-Decode-Execute Cycle Direct Memory Access Speeding Things Up Current Architectures Conclusion Direct Memory Access (DMA)

DMA is used on all modern computers e.g., the Intel i7 has an on-chip DMA controller How DMA works (without getting into details): The CPU simply tells the DMA controller to initiate a RAM copy When the copy is complete the DMA controller tells the CPU “it’s done” by generating an (more on very soon) In the meantime, the CPU was free to do whatever

Henri Casanova ([email protected]) Computer Architecture Overview In the meantime, the code executed by the CPU likely also uses the memory bus Therefore, they can interfere with each other There are several ways in which this interference can be managed (give priority to DMA, to CPU, weight usage, ...) See a Computer Architecture course In general, using DMA leads to much better performance anyway and (good) software should to it as often as possible

History Caching Von Neumann Model Locality Fetch-Decode-Execute Cycle Direct Memory Access Speeding Things Up Current Architectures Conclusion DMA is not free

To perform data transfers the DMA controller uses the memory bus

Henri Casanova ([email protected]) Computer Architecture Overview Therefore, they can interfere with each other There are several ways in which this interference can be managed (give priority to DMA, to CPU, weight usage, ...) See a Computer Architecture course In general, using DMA leads to much better performance anyway and (good) software should to it as often as possible

History Caching Von Neumann Model Locality Fetch-Decode-Execute Cycle Direct Memory Access Speeding Things Up Current Architectures Conclusion DMA is not free

To perform data transfers the DMA controller uses the memory bus In the meantime, the code executed by the CPU likely also uses the memory bus

Henri Casanova ([email protected]) Computer Architecture Overview There are several ways in which this interference can be managed (give priority to DMA, to CPU, weight usage, ...) See a Computer Architecture course In general, using DMA leads to much better performance anyway and (good) software should to it as often as possible

History Caching Von Neumann Model Locality Fetch-Decode-Execute Cycle Direct Memory Access Speeding Things Up Current Architectures Conclusion DMA is not free

To perform data transfers the DMA controller uses the memory bus In the meantime, the code executed by the CPU likely also uses the memory bus Therefore, they can interfere with each other

Henri Casanova ([email protected]) Computer Architecture Overview See a Computer Architecture course In general, using DMA leads to much better performance anyway and (good) software should to it as often as possible

History Caching Von Neumann Model Locality Fetch-Decode-Execute Cycle Direct Memory Access Speeding Things Up Current Architectures Conclusion DMA is not free

To perform data transfers the DMA controller uses the memory bus In the meantime, the code executed by the CPU likely also uses the memory bus Therefore, they can interfere with each other There are several ways in which this interference can be managed (give priority to DMA, to CPU, weight usage, ...)

Henri Casanova ([email protected]) Computer Architecture Overview History Caching Von Neumann Model Locality Fetch-Decode-Execute Cycle Direct Memory Access Speeding Things Up Current Architectures Conclusion DMA is not free

To perform data transfers the DMA controller uses the memory bus In the meantime, the code executed by the CPU likely also uses the memory bus Therefore, they can interfere with each other There are several ways in which this interference can be managed (give priority to DMA, to CPU, weight usage, ...) See a Computer Architecture course In general, using DMA leads to much better performance anyway and (good) software should to it as often as possible

Henri Casanova ([email protected]) Computer Architecture Overview History Caching Von Neumann Model Locality Fetch-Decode-Execute Cycle Direct Memory Access Speeding Things Up Current Architectures Conclusion Current Architectures

Current architectures are much more complex than what we just described Because constructors cannot increase further (power/heat issues), our current CPUs are multi-core Multiple “low” clock rate CPUs on a single chip This is a great solution to a problem, but most users/programmers would rather have a 100 GHz single core than 50 2 GHz cores We’ll talk about multi-core architectures later in the semester

Henri Casanova ([email protected]) Computer Architecture Overview History Caching Von Neumann Model Locality Fetch-Decode-Execute Cycle Direct Memory Access Speeding Things Up Current Architectures Conclusion Example of a real-life system

Picture obtained with lstopo (sudo apt-get install hwloc)

Henri Casanova ([email protected]) Computer Architecture Overview History Von Neumann Model Fetch-Decode-Execute Cycle Speeding Things Up Conclusion Conclusion

If you want to know more: Take ICS312 / ICS331 Take Computer Architecture (EE 461, ICS 431) Computer Organization and Design, Patterson and Hennessy

We will have a quiz on these lecture notes next week

Henri Casanova ([email protected]) Computer Architecture Overview